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Preface 


Multivariate t distributions have attracted somewhat limited attention of 
researchers for the last 70 years in spite of their increasing importance in 
classical as well as in Bayesian statistical modeling. These distributions 
have been perhaps unjustly overshadowed ~ during all these years — by 
the multivariate normal distribution. Both the multivariate t and the 
multivariate normal are members of the general family of elliptically 
symmetric distributions. However, we feel that it is desirable to focus 
on these distributions separately for several reasons: 


e Multivariate t distributions are generalizations of the classical univari- 
ate Student ¢ distribution, which is of central importance in statistical 
inference. The possible structures are numerous, and each one pos- 
sesses special characteristics as far as potential and current applica- 
tions are concerned. 

e Application of multivariate ¢ distributions is a very promising ap- 
proach in multivariate analysis. Classical multivariate analysis is 
soundly and rigidly tilted toward the multivariate normal distribu- 
tion while multivariate t distributions offer a more viable alternative 
with respect to real-world data, particularly because its tails are more 
realistic. We have seen recently some unexpected applications in novel 
areas such as cluster analysis, discriminant analysis, multiple regres- 
sion, robust projection indices, and missing data imputation. 

ə Multivariate t distributions for the past 20 to 30 years have played a 
crucial role in Bayesian analysis of multivariate data. They serve by 
now as the most popular prior distribution (because elicitation of prior 
information in various physical, engineering, and financial phenomena 
is closely associated with multivariate ¢ distributions) and generate 
meaningful posterior distributions. This diversity and the apparent 


xi 


xii Preface 


ease of applications require careful analysis of the properties of the 
distribution in order to avoid pitfalls and misrepresentation. 


The compilation of this book was a somewhat daunting task (as our 
Contents indicates). Indeed, the scope of the multivariate t distribu- 
tions is unsurpassed, and, although there are books dealing with multi- 
variate continuous distributions and review articles in the Encyclopedia 
of Statistical Sciences and Biostatistics, the material presented in these 
sources is quite limited. 

Our goal was to collect and present in an organized and user-friendly 
manner all of the relevant information available in the literature worthy 
of publication. It is our hope that the readers — both novices and experts 
— will find the book useful. Our thanks are due to numerous authors who 
generously supplied us with their contributions and to Lauren Cowles, 
Elise Oranges and Lara Zoble at Cambridge University Press for their 
guidance. We also wish to thank Anusha Thiyagarajah for help with 
editing. 


Samuel Kotz 
Saralees Nadarajah 
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Introduction 


1.1 Definition 


There exist quite a few forms of multivariate ¢ distributions, which will 
be discussed in subsequent chapters. In this chapter, however, we shall 
describe the most common and natural form. It directly generalizes the 
univariate Student’s ¢ distribution in the same manner that the multi- 
variate normal distribution generalizes the univariate normal distribu- 
tion. 

A p-dimensional random vector X = (X1,...,Xp)* is said to have the 
p-variate t distribution with degrees of freedom v, mean vector p, and 
correlation matrix R (and with © denoting the corresponding covariance 
matrix) if its joint probability density function (pdf) is given by 


—(v+p)/2 
= Na alt tage Ro Gai) si 


IO = aero Y 


(1.1) 


The degrees of freedom parameter v is also referred to as the shape pa- 
rameter, because the peakedness of (1.1) may be diminished, preserved, 
or increased by varying v (see Section 1.4). The distribution is said to 
be central if p = 0; otherwise, it is said to be noncentral. 

Note that if p = 1, w = 0, and R = 1, then (1.1) is the pdf of the 
univariate Student’s t distribution with degrees of freedom v. These 
univariate marginals have increasingly heavy tails as v decreases toward 
unity. With or without moments, the marginals become successively less 
peaked about 0€ Ras v } 1. 

If p = 2, then (1.1) is a slight modification of the bivariate surface of 
Pearson (1923). If v = 1, then (1.1) is the p-variate Cauchy distribution. 
If (v +p)/2 = m, an integer, then (1.1) is the p-variate Pearson type VII 
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distribution. The limiting form of (1.1) as v — oo is the joint pdf of the 
p-variate normal distribution with mean vector p and covariance matrix 
X. Hence, (1.1) can be viewed as an approximation of the multivariate 
normal distribution. The particular case of (1.1) for» = 0 and R = I 
is a mixture of the normal density with zero means and covariance ma- 
trix vI, — in the scale parameter v. The class of elliptically contoured 
distributions (see, for example, Fang et al., 1990) contain (1.1) as a 
particular case. Also (1.1) has the attractive property of being Schur- 
concave when elements of R satisfy ri; = p, i # j (see Marshall and 
Olkin, 1974). Namely, if a and b are two p-variate vectors with compo- 
nents ordered to achieve a1 > az > --- > ap and by > bp > --+ > bp, and 
if this ordering implies )>*_, a; < Y$ bi for k = 1,2,...,p— 1 and 
$21 ai < SOP, bi, then (1.1) satisfies f(a) > f(b). 

In Bayesian analyses, (1.1) arises as: (1) the posterior distribution of 
the mean of a multivariate normal distribution (Geisser and Cornfield, 
1963; see also Stone, 1964); (2) the marginal posterior distribution of 
the regression coefficient vector of the traditional multivariate regres- 
sion model (Tiao and Zellner, 1964); (3) the marginal prior distribution 
of the mean of a multinormal process (Ando and Kaufman, 1965); (4) 
the marginal posterior distribution of the mean and the predictive dis- 
tribution of a future observation of the multivariate normal structural 
model (Fraser and Haq, 1969); (5) an approximation to posterior dis- 
tributions arising in location-scale regression models (Sweeting, 1984, 
1987); and (6) the prior distribution for set estimation of a multivariate 
normal mean (DasGupta et al., 1995). Additional applications of (1.1) 
can be seen in the numerous books dealing with the Bayesian aspects of 
multivariate analysis. 


1.2 Representations 
If X has the p-variate t distribution with degrees of freedom v, mean 


vector #4, and correlation matrix R, then it can be represented as 


e If Y is a p-variate normal random vector with mean 0 and covariance 
matrix ©, and if vS?/o? is the chi-squared random variable with 
degrees of freedom v, independent of Y, then 


X = SHY +p. (1.2) 


This implies that X | S = s has the p-variate normal distribution with 
mean vector p and covariance matrix (1/s?)®. 
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thre Sst eel 6 nus holdat 


=S =m =F 
-> -2 ~1 


iiv] hedei 


-3 ~2 -n 


heddi hodat hedini 


Fig. 1.1. Joint contours of (1.1) with degrees of freedom v = 1, zero means, 
and correlation coefficient p = 0.8,0.6,..., —0.6, —0.8 
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rhoed Suz thot Bu? hodin? 


-3 =—-2 —1 
e -2 1 
-ə -2 <1 


-i -2 -1 


hoedd? heds? heden? 


Fig. 1.2. Joint contours of (1.1) with degrees of freedom v = 2, zero means 
and correlation coefficient p = 0.8, 0.6,..., —0.6, -0.8 
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hodine theah Spel todit 


-3 -2 —1 


hodni tod A l hetn 


thoe.A uct hetsa hoeddi 


Fig. 1.3. Joint contours of (1.1) with degrees of freedom v = 10, zero means, 
and correlation coefficient p = 0.8, 0.6, ..., —0.6, —0.8 
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tho Bett tho Sed thea dnc’ 


hodn thee ued hA 2nd 


-5 -2 -1 


thee 140%) rhe ni thee BUX) 


Fig. 1.4. Joint contours of (1.1) with degrees of freedom v = 30, zero means, 
and correlation coefficient p = 0.8,0.6,..., —0.6, —0.8 
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e If V!/2 is the symmetric square root of V, that is, 

VIV = V~ W, (R, v+p-1), (1.3) 
where W(X, n) denotes the p-variate Wishart distribution with de- 
grees of freedom n and covariance matrix X, and if Y has the p-variate 
normal distribution with zero means and covariance matrix vI, (I, is 
the p-dimensional identity matrix), independent of V, then 

-1 
X = (vi) Ytp (1.4) 


(Ando and Kaufman, 1965). This implies that X | V has the p-variate 
normal distribution with mean vector yz and covariance matrix vV~. 


1.3 Characterizations 
From representation (1.2) it easily follows for any a Æ 0 that X has the 
joint pdf (1.1) if and only if 
X |S? =$ ~N (p,s~75) 
e (aT £a) 1” a’ (X - u) | S? =s? ~ N (0,87?) 
Se (a7 Za) ena oe (X-p)~to, 
and this is one of the earliest characterization results given in Cornish 
(1962). This result can also be obtained by using the representation 
(1.4): X has the joint pdf (1.1) if and only if 
X|V~\N(p,vV~) 
e (aT £a)" a’ (X-u) |V~N (0, v (a? V~'a) / (a? £a)) 
c (a? Za) er (X - p) ~ ty, 
as noted by Lin (1972). 

Lin (1972) obtained two further characterizations using the represen- 
tation (1.2). Let vS? ~ x? and let X1, X2, ..., Xp be conditionally in- 
dependent continuous random variables symmetrically distributed with 
E(X, | S? = 8) = pe and Var(X, | S? = s?) = o2/s? < œ for 
k = 1,...,p. Then the following characterizations are valid 
e (X1,Xo2,..., Xp)? has the joint pdf (1.1) with mean vector p, covari- 


ance matrix D, and degrees of freedom v if and only if 


2, (Xn — pe)? 
>D eg y Fors 
k=1 Pok 
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where D is a px p diagonal matrix with its kth diagonal element equal 
to ož. 


e In the special case gł = o? for all k and the conditional pdf of X, | 
9S? = 8? is positive and differentiable for all x € R, (X1, X2,- ., Xp)T 
has the joint pdf (1.1) with zero means, covariance matrix o°Ip, and 
degrees of freedom v if and only if the joint pdf of X1, X2,...,Xp is 
a function of z? +23 +---+22 only. 


1.4 A Closure Property 


Consider Studentizing transformations T : R” — R*, depending on 
matrices A(n x k), B(n x v) and Q(n x n), given by 


ATX 


TX) = Terx] 


(1.5) 


such that A7QB = 0. Jensen (1994) established that the class of mul- 
tivariate t distributions is closed under the transform T(-). Specifically, 
assume ATA = I}, BTB = [,, and X is distributed according to (1.1) 
with zero means, correlation matrix In, and degrees of freedom m. Un- 
der these assumptions, Jensen showed that T'(X) is also distributed ac- 
cording to (1.1) with zero means, correlation matrix I}, and degrees of 
freedom v. 

Jensen (1994) also studied the concentration properties of (1.1) via 
peakedness by varying its parameters. If X is multivariate normal, then 
the transformation X — T(X) diminishes the peakedness. If, on the 
other hand, X is distributed according to (1.1) with mean vector pln, 
covariance matrix o7I,, and degrees of freedom m, then the transfor- 
mation is peakedness-enhancing for all m < v. If m > v > 2, then 
the transformation serves to increase variances. For any m > v > 0 
the marginal distributions are less peaked after T'(X) than before in the 
sense of Birnbaum (1948). If m = v, then the marginals are identical 
before and after T(X), thus exhibiting identical tail behavior. If v > m 
then marginals are more peaked (in the sense of Birnbaum, 1948) after 
applying T(X) than before; and if v > m > 2, then T(X) serves as a 
variance-diminishing transformation. 
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1.5 A Consistency Property 


A random vector X = (Xi,...,Xp)" is said to have the spherical dis- 
tribution if its joint pdf can be written in the form 


P 
9 is a o) ; 
i=1 


where g(-) is referred to as the density generator. The p-variate t pdf 
(1.1) with u = 0 and & = I, is spherical because in this case, 


T ((y + p)/2) ( Beene 
(xv)P/2T (v/2) Í 


glu) 


V 


Other examples of spherical distributions include the multivariate nor- 
mal and the multivariate power exponential. A spherical distribution is 
said to possess the consistency property if 

p) (1.6) 


œ p+1 p 
/ g È p) dtp = g È a} 
=00 i=1 i=1 


for any integer p and almost all x € RP. This consistency property 
ensures that any marginal distribution of X also belongs to the same 
spherical family. Kano (1994) provided several necessary and sufficient 
conditions for a spherical distribution to satisfy (1.6). One of the them 
is that g must be a mixture of normal distributions; specifically, there 
exists a random variable Z > 0, unrelated to p, such that, for any p, 


f(ulp) = [(g)" (-=) F(dz), 


where F(-) denotes the cumulative distribution function (cdf) of Z. 
Since the multivariate ¢ is a mixture of normal distributions (see (1.2)), 
it follows that it must have the consistency property. Other distributions 
that have the consistency property include the multivariate normal and 
the multivariate Cauchy. Distributions that do not share this property 
include the multivariate logistic, multivariate Pearson type II, multivari- 
ate Pearson type VII , and the multivariate Bessel. 


1.6 Density Expansions 
Fisher (1925) and later Dickey (1967a) provided expansions of the pdf 


Detya f, A A 
fey = toh 
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of the univariate Student’s t distribution. The expansion in the latter 
paper involves Appell’s polynomials, and hence recurrence schemes are 
available for its coefficients. Specifically, 


fee) = Shrew (+t e) Yas (—*E pe) atu 
(1.7) 
where 
k-1 
Ga) = Pelt) ~ a QDPD). (1.8) 
T Zo 


Here, P,(t) are polynomials (in powers of t) satisfying 


-(1+v)/2 
Trdmary® = (1-75) e9 
k 


l+v 


and P(T) denotes the polynomial P(t) with the powers t” replaced 
by [(r + 1/2). Dickey (1967a) also provided an analog of (1.7) for the 
multivariate t pdf (1.1). It takes the same form as (1.7) with z? replaced 
by (x — )7™R7!(x—y), v+1 replaced by v + p, and with (1.8) replaced 
by 


Qt) = P,(t)- TỌ rom Èa )Pk-a( 


where I, indicates the substitution of P(r + p/2) for t”. 


1.7 Moments 


Since Y and S in (1.2) are independent, the conditional distribution of 
(Xj, Xj), given S = s, is bivariate normal with means (4, 4j), common 
variance a? / s”, and correlation coefficient rij- Thus, 

E (Xi) E|E(Xi|S = s)] 
E (ui) 


= Hi- 


II 


To find the second moments, consider the classical identity 


Cov (Xi, X;) = E [Cov (Xi, X; S = 5] 
+Cov [E(Xi|S = $) E(X,|S = s)] 
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for all 1,7 =1,...,p. Clearly, one has 
1 
E[Cov(X;,X;)|S=s] = ot jE (z) 
and 
Cov [E(Xi|S = s) E(X,;|S=s)] = 0. 


If v > 2, then E(1/S7) exists and is equal to v/{o?(v — 2)}. Thus, by 
choosing i = j and i < j, respectively, one obtains 
v 


Var(X;) = Z3 


and 


V 


PE ye 
Te Ni 


Cov (Xi, Xj) 


Hence the matrix R is indeed the correlation matrix as stated in defini- 
tion (1.1). 

In the case where yz = 0, the product moments of X are easily found 
by exploiting the independence of Y and S in (1.2). One obtains 


E ese 
j=l 


bry ,r2,..4rp 


z |a= (TJY 
j=1 


p 
= o WME) TT YS?) Bla"), 
j=l 


provided that r = rı + r2 +---+7p < v/2. In the special case where 


Yi, ..-, Yp are mutually independent, one obtains 
p 
Hry,r2,..4tp = o`y" PE [x77] II E iv] : 
j=l 


If anyone of the r;’s is odd, then the moment is zero. If all of them are 
even, then 


_ vea {1-3-5---(2ry — 1} 
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In particular, 


v 
E2,0,..,0 = Jy v>2, 

3y? v>4 

4,0, At) (v — 2)(v — 4)’ ? 

ue >4 

H2,2,0,...,0 (v —2)(v —4)’ Vv ) 

and 
y3 
= ——____., >6 
}42,2,2,0,...,0 W x Dv -T Dv — 6) V 
1.8 Maximums 

Of special interest are the moments of Z = max(Xı,..., Xp) when 


XT = (X,...,Xp) has the t pdf (1.1) with the mean vector p and 
covariance matrix ©. These moments have applications in decision the- 
ory, particularly in the selection and estimation of the maximum of a 
set of parameters. It also has applications in forecasting. The problem 
of finding the moments of Z has been considered by Raiffa and Schlaifer 
(1961), Afonja (1972), and Cain (1996). 

Raiffa and Schlaifer (1961) provided an expression for E(Z — 6) for 
the case where p = 3 and p = 61, (where 1, denotes a vector of 1’s). 
Afonja (1972) generalized this for the general case of unequal means, 
variances, and correlations. We mention later a particular case of this 
result for p = 61,. Let p(y; R) denote a p-dimensional normal pdf 
with zero means, unit variances, and correlation matrix R. Also let R; 
denote a p x p matrix with its (j,7')th element equal to r;,;;,, where 
ri jj (j, J! # 4) is the correlation between (X;, —X;) and (X;, —Xj-) and 
Tig = corr(X;, Xi — X;). Then the kth moment of Z is given by 


E) = roy >> (5) i (se) 0 (42) Hs (ye) 


i=1 j=0 
(1.9) 


where 


Pf fin fF devon 


dypdyp—1 +++ dyi: + - dy2dyı (1.10) 
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is the marginal moment (up to a constant) of truncated normal variates. 
The mean and variance can be derived easily from this formula. For 


example, 
y-1 V 
E(Z) = 6+{E(W) -ar ( ) /v (=) 
where W = max(Yj,..., Yp) for a p-variate normal random vector YT = 
(Y1,--.,¥p) with means equal to @ and covariance matrix (v/(v —2))D. 


Afonja (1972) showed further that 


Pp 
EW) = 0+\/-—5 >> vam (ui), 


where ji (y;) is given by (1.10) for 7 = 1. 

More recently, Cain (1996) considered two forecasts F} and F, of a 
future variable Y where the forecast errors X, = Fi —Y and X2 = Fh,-Y 
are assumed to have the bivariate t distribution with means (4, 42), 
variances (o7,03), correlation coefficient p, and degrees of freedom v > 
2. Cain was interested in the maximum Z = max(X,, X2) of the two 
forecast errors and whether this nonlinear function could be useful as a 
component of a linear combination forecast. It was shown that the pdf 
of Z can be written as the sum 


f(z) = filz)+ fa(z), 


1 v [ v z-jū 
z = —,/—t tet ace 
fil2) ajVu-—2 ( vy—2 Gi ) 


teejay 


a Pile aad (z) 


for k = 3—j, j = 1,2. Here, t, and T, are, respectively, the pdf and the 
cdf of the Student’s ¢ distribution with degrees of freedom v. Integration 
by parts yields that 


E(Z) = mf fi (e)dz + m f falz)dz + Ttv- , (H =H), 


where 


xTi+v 


Var(Z) = o? a filz)dz + o2 T fa(z)dz 
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+m- f hode f hod 


+7 (m — He) tv-2 (=£) T fo(z)dz 


-r (1 =n) tran (E) f" p(eyae 


(u — m) (03 — o7) Hı — pe 
v2) toa (BO = 


Hı — H2 
-r°t 5 (“e 7 ) , 
and 


Cov(Z,Xı) = oF |  fileide+ pron f Hoa 
(m — H2) (of — poio) | ; (£ - r) 
) v- T , 


$ T(v -2 


where T = yo? + o2 — 2p0i02. The two integrals in the above expres- 
sions can be evaluated as 


es z Hı — H2 
f Hoa = To (e m — 
[hoi = 1-7, (88 
a T a 


The expression for Cou(Z, X2) can be obtained by switching the sub- 
scripts 1 and 2. As v — oo, the above expressions can be reduced by 
replacing t,(-) and T,(-) by ¢(-) and ®(-), respectively. On the other 
extreme, as vy — 2+, the expressions could be reduced by using the fact 


that 
fal = fO ifs=0, 
a a a 1/2, ifs #0, 


7 1, if z > 0, 

lim, Ty 2 (y z7) = 1/2, ifa=O0, 

y—-2 Y 0, ifr <0. 
This suggests that the results for the maximum of bivariate ¢ distributed 
errors may be materially different from those for bivariate normal errors. 


and 


and 
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Cain (1996) also investigated to see whether the maximum Z can 
provide information additional to that of F, and F, in forecasting Y 
via a linear combination of the form F = a+ 6, F, + BoF2 + yM with 
Bı + B2 +y = 1. Cain showed that the mean squared error of F is 
minimized when y = 0 and hence that M is linearly dominated by F) 
and F2. Similar calculations reveal that the mean forecast (PF) + F))/2 
dominates M if and only if either u = u2 or 0) = og. Evidently further 
investigations are in order (to consider, for example, the case of more 
than two forecasts). 


1.9 Distribution of a Linear Function 


If X has the p-variate ¢ distribution with degrees of freedom v, mean vec- 
tor yz, and correlation matrix R, then, for any nonsingular scalar matrix 
C and for any a, CX +a has the p-variate ¢ distribution with degrees of 
freedom v, mean vector Cp +a, and correlation matrix CRC’. This re- 
sult is of importance in applications and is similar to the corresponding 
result for the multivariate normal distribution. 


1.10 Marginal Distributions 


Let X possess the p-variate t distribution with degrees of freedom v, 
mean vector p, and correlation matrix R. Consider the partitions 


Xx = ( x | (1.11) 
Hi 
= 1.12 
g ( H2 ) ( 
and 
Rn Riz ) 
R = 1 1.13 
( Rai Re ee 


where X; is pı x 1 and Ry; is pı x py. Then X, has the p;-variate 
t distribution with degrees of freedom v, mean vector p, correlation 
matrix R1, and with the joint pdf given by 


a e i 2) 
fœ) = (rv)P1/2T (v/2) Rat 
re —(v+p1)/2 
x Jit 7 (x =) Ry Ga - #4) 
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Moreover, X, also has the (p — pı )-variate ¢ distribution with degrees of 
freedom v, mean vector fy, correlation matrix R22, and with the joint 
pdf given by 

fm) = teen? 

(mv)P1/2T (v/2) [R22] 

1 To —(v+p—pi)/2 
x jl+ a (x2 — My) Rz (X2 — H2) 


1.11 Conditional Distributions 


Several interesting properties have been obtained for conditional pdfs of 
the multivariate ¢ distribution. If X has the central p-variate ¢ distribu- 
tion with degrees of freedom v and correlation matrix R, it then follows 
from Section 1.10 that the conditional pdf of X> given Xj is given by 


fle |m) = —Pe+e/2) (Rul? 
(wr) PT + p)/2) R 


L+ Umf Rg 


[1+ (1/v)xT Rx] "+? B 
Since 
IR| = [Ru] |R22 - Rə R3} Ra2| 
and 
xR x = xi Rī; xı +3.) R32.1X21, 
where 
X21 = X% — Ra R} x1 
and 
Ro. = R -RaR Ri, 
one can rewrite (1.14) as 
f(xz|x) = mn ae 172 
{v + pir} T ((v + pi)/2) [R221] 
x h + 1 & Eeg Rapea) 
v+p 14+(1/v)xP R xX 


w+p)/v (p—p1)/2 
ý f + e l a) 
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Landenna and Ferrari (1988) noted that this conditional pdf is not a 
(p — pı )-variate ¢ unless the values of x; are +1. For example, consider 
the special case of (1.15) for R. = Ip. In this case, (1.15) becomes 


T ((v + p)/2) 
f(x.|x) = / (p—p1)/2 
n(p—P1)/2T ((v + pı)/2) (v + ja 55 ) 
5 —(v+p)/2 
1 
x |1 + —S z3 
v+ Dra x} 2: ‘ 
(1.16) 
When Tj = +1, j = 1,2,:..,P1, (1.16) reduces to 
T ((v + p)/2) 
f (xe | xı) z / (p—pi)/2 
(p-P1)/2T ((v + py)/2) (v + pr) 
—(v+p)/2 


which is the joint pdf of a central (p — p,)-variate ¢ distribution with 
degrees of freedom (v + p;) and correlation matrix I,_»,. Landenna and 
Ferrari (1988) also described the manner in which the probabilities of 
the conditional pdf (1.15) can be expressed in terms of the probabilities 
of x2 conditioned on x; taking the values +1. 

The form of the conditional pdf (1.15) also suggests that 


and 
= 1 7 —1/2 E 
to = VEE (r bapa) onani 
(1.18) 


are independent, that Y, has the central p,-variate t distribution with 
degrees of freedom v and correlation matrix R,,, and that Y> has the 
central (p — pı )-variate t distribution with degrees of freedom v + pı and 
correlation matrix Rəs.. From this observation, it follows easily that 
the conditional expectation of Xə given X; is linear and that E(X2 | 
X,=x,)= Ra Rj’ x. In particular, 


1 3 

— — — — * $ 

E (Xp |X: = T1; .--, Xp-1 = Lp-1) = r Tipi 
PP j=0 
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and 


1 v 15 Tip" k 
= ——— |l+ż- >D fre- ort | ajap , (1.19) 
Trp Utp-3 eras r 


where r5, is the (j, k)th element of R-! (Bennett, 1961). It is illumi- 
nating to compare the conditional variance (1.19) with the value 1/r},, 
corresponding to the conditional variance of the multivariate normal 
distribution. 

Siotani (1976) generalized the result of (1.17)-(1.18) by splitting X 
into more than two sets of variates. Let 


Xı 
Xo 
oan ee (1.20) 
Xk 
and 
Rn Riz © Rik 
Ro Ro = Ræ 
R = x t ‘ ) 
Re Reo © Rex 


where X; is pı x 1 for! = 1,2,...,k and Rim is pı X pm for l = 1,2,...,k, 
m=1,2,...,k. Clearly pı +po+---+p, = p. Introducing the notations 


qı = pı +p2 +++: + Pi, (1.21) 
Xı 
X2 
Xi = : 3 (1.22) 
X; 
Rit Rio Ry 
Ro Ree =- Ra 
Rw =F : A T : , (1.23) 
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Ri i4i 
RGD = ici l (1.24) 
Riis 
and 
Rio = Reiss - RHD" RoR) (1.25) 


Siotani showed that 


and 


-1/2 
jv +a lora- 
Yi4i = F (1 + 5X; Xw) 
Xin ROO ROX 
x | Au) (1) a AN 


for l = 1,...,k — 1 are independent, that Y, has the central p)-variate 
t distribution with degrees of freedom v and correlation matrix Ry, 
and that Yj41 has the central pı+ı-variate t distribution with degrees of 
freedom (v + qı) and correlation matrix Ripı +1- for l= 1,...,k — 1. 
In the special case for R = Iņ, the Y’s can be written as 


Yı = X: 
and 
ini —1/2 
Ju + 
Yı = <a (1 + 7 a XE%n] Xii. 
m=i 


1.12 Quadratic Forms 


If X has the p-variate t distribution with degrees of freedom v, mean vec- 
tor p, and correlation matrix R, then X7R~!X/p has the noncentral F 
distribution with degrees of freedom p and v and noncentrality parame- 
ter TR ~!u/p. See Hsu (1990) for a particular case of this result. When 
u = 0, the distribution is central F and so X7R7!X/(p + X7R7'!X) 
has the Beta(p/2,v/2) distribution. There are a number of problems 
related to quadratic forms of multivariate t that are worthy of further 
investigation. 
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1.13 F Matrix 


Consider two independent random samples x), : x) and x?) zy x2) 


from two different. elliptical distributions (which contain aara t 
as a particular case — as already mentioned in Section 1.1). Let 


= Sd OT 


for i = 1,2. Then F = (S1 /n1)/(S2/n2) is the multivariate F matrix. 
Hayakawa (1989) studied the asymptotic behavior of the determinant, 
latent roots, latent vectors, and the trace of the F matrix for an elliptical 
population. These results are useful in the study of the robustness of 
the statistics derived for testing several hypotheses about parameters of 
a normal population with the elliptical distribution introduced as the 
alternative population. Hayakawa (1989) illustrated the usefulness of 
the results through a multivariate t-population. 


1.14 Association 


The well known definition states that the random variables X4, ..., Xp 
are said to be associated if 


Cou(f(X1,---,Xp),9(X1,---,Xp)) 2 0 


for all nondecreasing functions f, g (Esary et al., 1967). Association im- 
plies positive quadrant dependence, that is, that Pr{N(X; < z:)} > 

pay Pr(Xi < 2;) for all real numbers z1,...,£p (Lehmann, 1966). 
Jogdeo (1977) and Abdel-Hameed and Sampson (1978) established that 
the components of a multivariate ¢ random vector are associated un- 
der certain conditions on correlations. More generally, the following 
result holds. Let Z be a p-variate vector with independent and real 
components, each having a symmetric unimodal distribution. Suppose 


Y = Z + U, where U is independent of Z and either 


(i) U = (œ@V,...,@kV,Ok+41W,.--;,@nW), where (V,W) has a bi- 
variate normal distribution centered at 0, 


(ii) or U = aW, where æ is an arbitrary but fixed p-variate vector 
and W is an arbitrary real random variable. 
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For (n + 1) independent and identically distributed (iid) copies YF = 
(Yu,---, Yip), i=0,1,...,n of Y define X?, j =1,...,p by 


Gide 
es Yj 
Then the variables X? (or, equivalently, | X; |), j = 1,.-.,p are associ- 
ated. 

Now, redefine Y as a p-variate normal random vector with zero means 
and covariance matrix specified by X = {rjjo;0;}. Let S? and Sz? be 
independent chi-squared random variables with degrees of freedom n 
and qx, respectively, for k = 1,...,p. Also assume that X, $2, and Sj? 
are mutually independent. Then, as a consequence of the above general 
result, one could provide the following assertions about bivariate and 
trivariate t vectors 


e For p = 2, the random variables 


{nI Irl 


yS? + +s Se + S3? 
are associated. 


e For p = 3, if [],.; sign(Aij) < 0, where A = {Aij} = =~’, then the 
random variables 


(X1, X2) = 


oln] _ I|) _ Inl 


\/ S? +S S2 + S3? de +53? 


(X1, X2, X3) = 


are associated. 


1.15 Entropy 


The entropy of a continuous random vector X may be regarded as a 
descriptive quantity, just as the median, mode, variance, and the co- 
efficient of skewness may be regarded as descriptive parameters. The 
entropy is a measure of the extent to which a multivariate distribution 
is concentrated on a few points or dispersed over many points. Thus, the 
entropy is a measure of dispersion, somewhat like the standard deviation 
in the univariate case. 
Mathematically, the entropy of X is defined by 


H(X) = E[-logf(X)] 
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= - ft% log f (x) dx. (1.26) 


Guerrero-Cusumano (1996a) derived the forms of this for the multivari- 
ate t distribution. For a central p-variate t, it turns out that 


H(X;R) = FlogiR| + 10g [Pw (2,2) 


v+p y+p v 
+ e( : ) ¥(3)| , (1.27) 
where y(t) = dlogI'(t)/dé denotes the digamma function. Note that 
(1.27) can reexpressed as H(X) = 1/2 | R | +8(v, p), where ®(v,p) is a 
constant that depends only on v and p. Table 1 in Guerrero-Cusumano 


(1996a) tabulates ®(v,p) for v = 1(1)35 and p = 1(1)5. The following 
is an abridged version of the table. 


Constant ® for H(X) = 1/2| R | +&(v, p) 


= 


p=1 p=2 p=3 p=4 p=5 


2.53102 4.83788 7.06205 9.24381 11.3999 
1.96028 3.83788 5.67306 7.48261 9.27502 
1.77348 3.50454 5.20997 6.89826 8.57432 
1.68176 3.33788 4.97687 6.60362 8.22121. 
1.62750 3.23788 4.83602 6.42500 8.00685 
1.59172 3.17121 4.74153 6.30474 7.86226 
1.56638 3.12359 4.67368 6.21809 7.75785 
1.54750 3.08788 4.62257 6.15261 7.67878 
1.53289 3.06010 4.58266 6.10135 7.61677 
1.52126 3.03788 4.55062 6.06010 7.56678 


OON OTR WN PR 


= 
jon) 


The particular case of (1.27) for v = 1 gives the entropy for the multi- 
variate Cauchy distribution 


1 qP/2 p 1 


ppe O] 


As v —> œ, (1.27) converges to the entropy of the normal distribution 
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given by 
H(X;R) = 5 log(2er) + 5 log IRI. (1.28) 
The sampling properties of (1.27) will be discussed in Chapter 9. 
For the noncentral p-variate t, (1.26) takes the general form 
(vr)?! 


H(X;R) = Flog iR| + 10g |B (2,2) + 


“=? (47,4), 
(1.29) 


where A = wp’ Ro! and M(v,p, A) is given by 


wien) = (E)E E v} 


j=0 
Setting v = 1 in (1.29), one can obtain the entropy of the noncentral p- 
variate Cauchy distribution. In the case p = 1, (1.29) coincides with the 
entropies for the univariate Student’s t and Cauchy distributions given, 
for example, in Lazo and Rathie (1978). 

Zografos (1999) provided a maximum entropy characterization of (1.1). 
The maximum entropy principle suggests to approximate the unknown 
pdf of X by the model that maximizes (1.26) subject to the constraints 
that define the class of pdfs considered. Jaynes (1957) asserted that the 
maximum entropy distribution, obtained by this constrained maximiza- 
tion problem, “is the only unbiased assignment we can make; to use any 
other would amount to an arbitrary assumption of information which 
by hypothesis we do not have.” Zografos (1999) showed that (1.1) is the 
solution to maximizing E[— log f(X)] subject to the constraint 


B [tog {1+ 3- R(X - m} = w (2548), 


where w(z; a) = y(x) — y(x — a), x > a, and 7(-) denotes the digamma 
function. For further discussion of maximum entropy methods, see Fry 
(2002). 


1.16 Kullback-Leibler Number 


The mutual information of a continuous random vector X with joint pdf 
f(x) and marginal pdfs f(z;),i=1,...,p is defined by 


ro = [eff sy 
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with the domain of variation given by 0 < T(X) < oo. (The reader 
should not confuse this with the transformation T(X) given in (1.5).) 
The quantity (1.30) can be considered a measure of dependence (Joe, 
1989). The larger the T(X), the higher the dependence among the 
variables X;,i = 1,...,p. Naturally, T(X) = 0 implies that the variables 
are independent; this latter statement. follows from the fact that T is a 
special case of the Kullback-Leibler number, K L(f,g) (Kullback, 1968). 
When the variables of X are multivariate normal with covariance matrix 
E, it is easy to compute T(X) as the difference between entropies given 
by (1.28); specifically, 


T(X;2) = H(X;%)-H(X;D), 


where D is a diagonal matrix corresponding to & with the elements 
011,--.;%pp- This is due to the well known fact that uncorrelatedness 
implies independence in the normal case. This fact also implies that 
T(X;1) = 0. In general, for any member of an elliptical family of dis- 
tributions, this is not true; in other words, uncorrelatedness does not 
imply that T(X) = 0. The mutual information attempts to summarize 
in a single number the whole dependence structure of the multivariate 
distribution of X. 

Guerrero-Cusumano (1996b) derived the form of (1.30) for the multi- 
variate t distribution. For a central p-variate t, it turns out that 


1 
T(X) = 2 — z log|RI, (1.31) 


where Q is given by 


ee oe) ee 


2 
ptv ptv v 
— —pl(—}>. 1.32 
r {Y (Ps) -¥G)] (1.82) 
It is easy to see that 2 > 0 as v oo. The mutual information for the 
multivariate normal distribution with correlation matrix R is given by 
—(1/2) log | R. | (Kullback, 1968). The particular case of (1.31) for v = 1 


gives the mutual information for the multivariate Cauchy distribution 
with Q taking the simpler form 


0 = maf TED) 2 (22) C) 
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Table 1 in Guerrero-Cusumano (1996b) provides values of (1.32) for a 
range of v and p. The following is an abridged version. 


Constant Q for T(X) =Q — (1/2) log| R | 


p=2 p=3 p=4 p=5 


0.4196180 0.949615 1.530690 2.141170 
0.2927000 0.705474 1.184010 1.704100 
0.2254360 0.565424 0.975130 1.431820 
0.1835450 0.473177 0.832265 1.240460 
0.1548760 0.407380 0.727338 1.096790 
0.1339950 0.357917 0.646600 0.984235 
0.1180970 0.319304 0.582368 0.893344 
0.1055830 0.288289 0.529959 0.818244 
0.0954730 0.262813 0.486337 0.755056 
0.0871342 0.241503 0.449434 0.701101 


© 
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Figures 1.5 and 1.6 graph T(X) in (1.31) for p = 2 and p = 4, respec- 
tively. The correlation matrix R is taken to have the equicorrelation 
structure Ti; = p, i £ j. It is interesting to see the “dale-shaped” three- 
dimensional plot. The figures show that, as one moves toward the center 
of the “dale,” the dependence among the variables decreases, and, as one 
moves away from the center, the dependence increases. 

For the normal case, Linfoot (1957) and Joe (1989) suggested a param- 
eterization for T(X) to make it comparable to a correlation coefficient. 
They defined the induced correlation coefficient based on the mutual 
information as 


pr = Vl-—exp{-2T (X)}. (1.33) 


Guerrero-Cusumano (1998) suggested a similar measure for the multi- 
variate ¢ distribution referred to as the dependence coefficient. It is given 
by 


pr = V1-—|R|exp(—29). (1.34) 


The dependence coefficient is a quantification of dependence among the 
p variables of X. This follows from the fact that independence implies 


26 Introduction 


Fig. 1.5. Mutual information, (1.31), for p = 2 


pr = 0 and that T(X) = œ implies py = 1. When v > ov, (1.34) 
coincides with (1.33). 
The sampling properties of (1.31) will be discussed in Chapter 9. 


1.17 Rényi Information 


Since the concept of Rényi information is not widely available in the 
literature, we provide here a brief discussion of the concept. Rényi in- 
formation of order for a continuous random variable with pdf f is 
defined as 


Tr(A) := — ig (J Pear) (1.35) 
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Fig. 1.6. Mutual information, (1.31), for p = 4 


for A Æ 1. Its value for À = 1 is taken as the limit 
Tri) := lim Zp(A) 
Awl 


TEE f f(a) log (f(a) dz 
= -Ellog f(X)], 


which is the well known Shannon entropy. Rényi’s (1959, 1960, 1961) 
generalization of the Shannon entropy allows for “different averaging of 
probabilities” via A. Sometimes (1.35) is also referred to as the spec- 
trum of Rényi information. Rényi information finds its applications as 
a measure of complexity in areas of physics, information theory, and 
engineering to describe many nonlinear dynamical or chaotic systems 
(Kurths et al., 1995) and in statistics as certain appropriately scaled 
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test statistics (Rényi distances or relative Rényi information) for testing 
hypotheses in parametric models (Morales et al., 1997). The gradient 
TR(à) = dZR(A)/dX also conveys useful information. In fact, a direct 
calculation based on (1.35) — assuming that the integral f fò(z)dz is 
well defined and differentiation operations are legitimate ~ shows that 


A( 
Th(1) lim ja-» y LE os fei + log (/ ev) | 


Jo-» 


1 im J LEE) log” ie - (ite “og fla) a 
~ 2X51 fS FNE) f f(x) 


-var log f(X)]. 


lI 


II 


In other words, the gradient of Rényi information at À = 1 is sim- 
ply the negative half of the variance of the log-likelihood compared 
to the entropy as the negative of the expected log-likelihood. Thus, 
the variance of the log-likelihood Z := 2Z'p(1) measures the intrinsic 
shape of the distribution. This can be seen by observing that Zs, where 
f(z) = (1/e)9((z — »)/o). In fact, according to Bickel and Lehmann 
(1975), it can serve as a measure of the shape of a distribution. In the 
case where f(x) has a finite fourth moment, it plays a similar role as 
a kurtosis measure in comparing the shapes of various frequently used 
densities and measuring the heaviness of tails, but it measures more than 
what kurtosis measures. 

Rényi information of order À for a p-variate random vector with joint 
pdf x is defined as 


TrR(A) := oe ( f P (1.2) dea dry) (1.36) 


The gradient Z,(A) and the measure Zy are defined similarly. 

Song (2001) provided a comprehensive account of Zp(A), Z,(A), and 
T; for well known univariate and multivariate distributions. For the 
univariate Student’s ¢ distribution with degrees of freedom v, it can be 
shown for A > 1/(1 + v) that 


vs all B((vh+—1)/2,1/2)) 1 
ZR) = ye AO Oh + gist), 
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aero B((vA +A —1)/2,1/2) 
ARA jef BORA | 


Nie) /và+À-1 
tea Y (==) 


-Eaten (A)| Jay, 


m = SEO) 


Using tables in Abramowitz and Stegun (1965), one obtains the partic- 
ular values 


and 


(1) = Ti 

E;(2) = e 
Z;(3) = Tan 
T;(4) = eee 
T;(5) = an? — =. 


It is interesting to note that the measure Zs(v) decreases as v increases, 
which makes sense since the tails become lighter as v increases. In fact, 
it can be shown, using asymptotic formulas for the trigamma function, 
that lim,o.Z;(v) = 1/2, which corresponds to the measure Zp(v) for 
the normal distribution. 

For the central p-variate ¢ distribution with correlation matrix R and 
degrees of freedom vy, it can be shown for À > p/(p + v) that 


mO) = ojlog{ SO Pe RPE 4 Stogo) IRD 
-vsr (2) 
; B((v\ + pd— 1)/2,P/2) 
Tr) = ies {= B(v/2,p/2) 


(1-A)(p+v) (vA+prA-p 
Coal, (aspar) 
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-Ep1 (24) | a-a, 


mo = E E) 


For p = 1, these expressions reduce to those derived for the Student’s t 
distribution. 


and 


1.18 Identities 


In one of the earliest papers on the subject, Dickey (1965, 1968) pro- 
vided two multidimensional-integral identities involving the multivariate 
t distribution. This first identity expresses a moment of a product of 
multivariate t densities of the form (1.1) as an integral of dimension 1 
less than the number of factors. Consider the product 


K 


g(x) = [I [1+ (= m)" Re (= m) 
k=1 


ae (1.37) 


where each R; > 0 and 4% > 0, and so each term may not have a finite 
integral. The identity seeks an expression for the complete p-dimensional 
integral of s - g, where s(x) is a polynomial in the coordinates of x. Let 
Y be a p-variate normal random vector with the covariance matrix and 
mean vector given by 


R =1 
D;' = (>: uma) 
k=1 


and 
K 
Bu = Dy Y Rehr: 
k=1 
respectively. For given constants ck > 0, k = 1,...,K, let u. = 


DA Cruz and up = vpu.. Then the quantity defined by Nsju 
E(s(Y)) can be expanded as a polynomial in 1/u. as 


Nolu = yA; (v1, . ..,UK)UD?. 
j 
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Given this terminology, the identity can now be expressed as 


f sax 
= Ko Dor (4-3) [D l? hy (vr... 10K) 


K 
x (i op) Wi-(-P)/2 dy, duK—1, (1.38) 
where 


K 
Ko = m’ J[ E 0/2, 
k=1 


K 
w= J Vk, 
k=1 


K 
D, = > wR, 
k=1 


k=1 


K K T K 
W, = Door {1 + LE ReH} = (>: nRa) D7’ c nRa) 
k=1 k=1 


and ø is the simplex 


K 
oc = fon: Jansa nol. 
k=l 


This identity has applications to inference concerning the location pa- 
rameters of a multivariate normal distribution. In the particular case 
K = 2, Ry = ypIp, and s = 1, (1.38) reduces to 


= (v.-p)/2p (Y1 V2 
f 00a = ore (4,22) 


where 


DE «a 
r (vı /2)T (v2/2) yy eager 
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F, is Appell’s hypergeometric function of two variables defined by 
Fy (a; 8, B's 932,y) 


= ae i a—l/1 _ y-a- /1 _ p -ba -p' 
= oeer it (1-t) (1 — tr) f (1 — tyy ® dt 
(1.40) 


(see, for example, Erdélyi et al., 1953), and 21 and 22 are the two real 
roots of the equation 


Y 
2+ (n A P+ -1)2—m iene EE 


The integral (1.39) is proportional to a multivariate generalization of the 
Behren-Fisher density. For an asymptotic expansion of (1.37) in powers 
of vk, see Dickey (1967a). 

The second identity given by Dickey (1968) — see also Dickey (1966b) 
— expresses the density of a linear combination of independently dis- 
tributed multivariate t vectors as an integral of dimension 1 less than 
the number of summands. Consider the r-variate vector 6 formed by 
the linear combination 


K 
>> BeXs, 
k=1 


where X, are independent q,-variate standard ¢ random vectors with 
zero means, covariance matrix I,,, and degrees of freedom vg. Dickey 
(1968) showed that 6 has the representation 


K 
6 = |> \%U;'BBTY, 
k=1 


where Ug are independent chi-squared random variables with degrees of 
freedom vk and Y is an independent r-variate standard normal vector. 
As a consequence, 6 has the further representation 


vV (vy, /v.) BB? W, 


k=1 


where v. = pees Vk, Ve = tel Dia U; and W is an independent 
r-variate standard ¢ vector with degrees of freedom v.. If the matrix 
>, B4B? is nonsingular, the distribution of 6 is nondegenerate with the 
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joint pdf 
K K -1 —(v.+r)/2 
f(6) = C (Ir) tar (Sor BBE) 5 
o \k=l k=1 
K 
/ XO (ve/ve) BBT dv; «++ dui, (1.41) 
k=1 


where 
C((v. +7) /2) 


as nT (14 /2)---T (ve /2) 


and as above 


K 
c= {renstK) Seam = no}, 


k=1 


This identity has applications in Behrens-Fisher problems. The version 
of (1.41) for K = 2 and By = fy is 


7 vi +p n+p 
O e a+?) 


v tp v.+pv.+p v. 
x Fy (=e, 9 Gt piane), 


where 
T ((v. + p) /2) 2\ 41/2 2\—(vitp)/2 
TPIT (11/2): -P (v2/2) (vbi) (v262) ’ 
F; is Appell’s hypergeometric function as defined in (1.40), and z; and 
z2 are the two real roots of the equation 


+ (LE oft 1), Lee 
v2 33 V2 83 vp} 
This special case is essentially equivalent to the two-factor version of 
(1.38). Moreover, (1.41) is a generalization of Ruben’s (1960) integral 
representation (in the univariate case) for the usual Behrens-Fisher den- 
sities. 


1.19 Some Special Cases 


A number of special cases of (1.1) have been studied in the literature 
with great detail. Cornish (1954), in his early paper, considered the 
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special case of (1.1) when p = 0 and R is given by the equicorrelation 
matrix 


1 -l/p ... —l/p 
Ro [7 1 e ne 
a | 


The following interesting properties were established 


e XTR-!X has the noncentral F distribution with degrees of freedom 
pand v. 

e X7?R~-'X has the Fisher’s z distribution with degrees of freedom p- q 
and v — when X is subject to the linearly independent homogeneous 
conditions represented by the equation SX = 0, where S is of order 
q x p and rank q < p. 

e The cdf of the quadratic form Q = XTAX when A is of rank q < p 
is given by 


eaS S E) T a 


where xT = (z1,..., £q) and the domain of integration is defined by 


q 
D riz? > Q, 
i=1 


where A; are the roots of the equation | ART! — A |= 0 or, alterna- 
tively, the latent roots of the matrix RA. Consequently, the distri- 
bution of XTAX is Fisher’s z with degrees of freedom q and v if and 
only if the nonzero latent roots of RA are all equal to unity. 

If the distribution of X is partitioned as in (1.11)-(1.13), then 


E(X:|X2) = -RI Rex, 
and 
Var (X;|X2) = v +x (Raz — Ra Ri Ruz) x p1. 
v+p-p -2 
In the particular case pı = 1, 
12 
E(X|X2) = -59 %5 


j=2 
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(p+ 1)y : 
VOR a) > op 8)" de pe TE 
p+1 
ETETE Fp I yt 
E[Var(Xı | X2)] = ie 
Var (X2) = — Ro, 
Cov (X1, Xi) x 


“oa 4=2,...,p. 

Furthermore, the residual variance of X, with respect to X3 is 
v pt+l 

y—2 “2p 


3 


and the partial correlation coefficient of X, with respect to X, is 
—1/2. 


Patil and Kovner (1968) provided a detailed study of the trivariate t 
density 

T ((n + 3)/2) 
(nm)3/2,,/1 — PT (n/2) 


lgi- 2prı +z} 2 eres 
«(142 Tage 1 8 ; 


f (£1, £2, £3) = 


Among other results, Taylor series expansions — in powers of 1/n — of 
the density and associated probabilities in rectangles were given. 
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The Characteristic Function 


The characteristic function (cf) of the univariate Student’s ¢ distribution 
for odd degrees of freedom was derived by Fisher and Healy (1956). 
Ifram (1970) gave a general result for all degrees of freedom, but Pestana 
(1977) pointed out that this result is not quite correct. More recent 
derivations are presented in Drier and Kotz (2002). Here we discuss two 
independent results on the characteristic function for the multivariate 
t distribution. The first one, due to Sutradhar (1986, 1988a), provides 
a series representation for the cf while the other, due to Joarder and 
Ali (1996), derives an expression in terms of the MacDonald function. 
The expressions given are rather complicated; thus, further research and 
possible simplifications may be desirable. 


2.1 Sutradhar’s Approach 
Let X be distributed according to 


T ((v + p)/2) 
(nv)P/20 (v/2) [RI 


1 To —(v+p)/2 
f [tS -m R a) 
(2.1) 


Consider the transformation Y = R~!/?(X — u). It then follows that 
the joint pdf of Y is 


—(v+p)/2 
fy(y) = yn (+5) : 


The cf of Y is 
dy (t;v) 
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T ((v + p)/2) ene 
= p T 
= PT UD Joriy ) (++ 08) dy, dyz- - dyp. 


To evaluate this integral, Sutradhar (1986) makes the orthogonal trans- 
formation Y = TZ, where the first column of the p x p matrix T is the 


vector 
ti t2 tp 


with || t ||= vtt. It follows that the cf of Z is given by 


T ((v + p)/2) 
; = pr! 
oz (t; Vv) Tw/2) exp (i || t || z1) dzı 
«| F fi cp +22) FP day. -dzp, (2.2) 
where z, E R, k = 1,...,p and cp = v+? =) z2. Successive integration 
of (2.2) with respect tO Zp, Zp-i,.--,22 yields 
T ((v + 1)/2) 
t; = pe ; 
oz ( sv) v VIT (v/2) ji, (2.3) 
where 
co 
h = f exp (i || t |] w) (w? + v) "t? az. (2.4) 
-0 


Note that Jı is an improper integral along the real axis, where w denotes 
a complex variable. For odd v, the integrand has poles of integer order 
while, for fractional and even v, the poles are of fractional order. Su- 
tradhar (1986) evaluated Jı separately for the three cases: odd v; even 
v; and fractional v, using the relations that 


éx(t;v) = exp (itu) dy (Rt) (2.5) 
and 
dy(tivy) = ġz(t;v) (2.6) 


to obtain the following expressions for ¢x. 
For the case of odd v 


VTT ((v + 1)/2) exp (it? - VAT Rt) 


ox(tiv) = 2-17 (v/2) 
Im—-k—-1 (2v vtTRt) 
( m—k ) k- 
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where m = (v + 1)/2. For the case of even v 


(=P ((v + 1)/2) exp (it p) 
VaT (v/2) Tea (m — k + 1/2) 


Sa (SE AmS Tl @-» 
j=0 \k=0, kj 
vtTRt T'(n+1) 
-I F ) (10g 4 -Fe h. ae 


where m = v/2. Finally, when v is of fractional order 


ọx(t;v) = 


m(—1)™v/2)-™P ((v + 1)/2) exp (it? u) 
XT (v/2) sin(Em)E (€ + 1/2) k- (444 — k) 


2n 
< |1 /vVvtTRt 2 Ty (n -— E- k) 
«> [3 / 2 l SER 


_ (eT Rt)§ Ilgo (n-k) Y 


ọx(t;v) = 


XT (n+1+€) (2.8) 


where m = [(v + 1)/2] is the integer part of (v +1)/2 and € = (v/2) — 
is such that 0 <| € |< 1/2. 

Both (2.7) and (2.8) involve infinite series. Checking the convergence 
of these series is an open problem. For another series representation for 
the cf of the multivariate t, see Javier and Srivastava (1988). 


2.2 Joarder and Ali’s Approach 


An integral representation of the MacDonald function is 


Ka(r) = @) Sef (1 +u?) CHP cos(ru)du, (2.9) 


where r > 0 and a > —1/2 (see, for example, Watson, 1958, page 172). 
A series representation of Ka(r) for r > 0 and a > 0 is given by 


K,(r) = 2-7 >» leis iD, -a 


vty) 
(—=1)°27 ea De rey Gra ma pita 
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—(- 1)°27 aa yti 1+a+J) pita 


j=0 cae 


log (r 
(—1)°27 (1+a) ew IG pay = oe pita (2.10) 


(see, for example, Lebedev, 1965, page 107, 110). A series representation 
of Ka(r) for r > 0 and nonintegral positive values of a is 


r?i—a r?i+a 


K,(r) = 2% > Waa janes + S saree) 


(see, for example, Spainer and Oldham, 1987, Chapter 51), where (c); = 
c(c+1)---(c+j—1) denotes the ascending factorial. Using (2.9), Joarder 
and Ali (1996) rewrote the integral (2.4) as 


zig 
2 


ee yeon f" {cos (|| t || w) + isin (|| t {| w)} (1%) os 


= 1 


OO 2 
ay wena f cos (|| t || w) (1 + 2) dz 
0 


v/2 
prey e (VEI) 


gi 


Thus, using (2.3), one obtains 
_ vae 
z(t) Te xeiro” (|| vot D w 


Hence, using the relationships (2.5) and (2.6), one arrives at the expres- 
sion for the cf given by Joarder and Ali (1996) 


v v/2 
x(t) = exp (itp) Ken (I vvRt I) - (2.12) 


Joarder and Ali also provided expansions of this cf using the series rep- 
resentations (2.10) and (2.11). For positive and even v, applying (2.10), 
one obtains 


v/2—1 
x(t) = exp (itp) {E C (j) || Wut ||? 


=0 


+) C2(9) Il VVR |+ 
j=0 
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— 2 2s (9) || VRE |+ log (|| VR I) \ 
j=0 


where 
(-1} (v/2-j -1)! 
Ci(j) = e a 
~ . Yi) +y(l+v/2+ 9) +log4 
Ol) = ou-a 
and 
Cali) eee 


(v/2—1)!$(v/24+ 9)! 
For positive and odd or fractional v, applying (2.11), the cf (2.12) be- 
comes 


dx(t) = exp (it™ 4) Ð {Di(s) || VRE IP? +D20) || VRE II", 


j=0 
where 
i, oh ani 
iG) = j(1—v/2), 
and 
. 2-"4-JT (—v/2 
pane (a 


r (v/2) 0 +v/2); 


Since the univariate Student’s t, multivariate Cauchy, and Pearson’s 
type VII are all particular cases of (2.1), the corresponding cfs in terms 
of the MacDonald function can be obtained from (2.12). They are as 
follows 


e For the univariate Student’s t distribution with the pdf 
r(i+v/2) (,, 2" 
Jun (v/2) v 


(where z € R and v > 0), the cf in terms of the MacDonald function 
is 


f(z) = 


pls v/2 
ox(t) = Ty Kon (Vv |¢|) 


(compare with Dreier and Kotz, 2002). 
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e For the p-variate Cauchy distribution with the joint pdf 


— _P(d+p)/2) Rr eT 70+) 
f(x) = aren wale u)” RO (x- p)| 


(where x € RP), the cf is 


éx(t) = exp (it?) V277 || VRE |"? Kr (II VRE I1) 
= exp (it™y- || VRE II), 
which follows by using the result that Ky/2(r) = /7/(2r) exp(—r) 


(see, for example, Tranter, 1968, page 19). 
e For the p-variate Pearson type VII distribution with the joint pdf 
I'(m) 


1 Tp- =e 
xX) = (x— R! (x— 
$0) = nF [1+ 2 (x WP R(x) 


(where x € RP, m > p/2 and v > 0), the cf becomes 


R m—p/2 
ox(t) = exp (it7p) a heen (I vuRt I) 


2.3 Lévy Representation 


Infinite divisibility of the univariate Student’s ¢ distribution was first 
proved by Grosswald (1976) — see also Kelker (1971) for a partial result. 
Later, Halgreen (1979) established the Lévy representation of its cf. For 
a multivariate t, Takano (1994) provided the first proof of infinite divisi- 
bility and the corresponding Lévy representation. Consider the standard 
case u = 0 and R =I). In this case, after suitable transformation, the 
joint pdf (2.1) can be written in the form 


fx) = PE (ia x yr 
The corresponding cf is 
gi-m 
o(t) = Tom) Hell" Km (|| t I). (2.13) 


Takano (1994) derived the Lévy representation of (2.13) in the form 


= es BERS ee T, eee E 
ee a| [ep {ort ie 
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«| f” om(2uw)Lnja (VZ Ix I) dw ay zo 


where 
=. / {Ja (Va) + Ya (v2)}, 
La(z) = (2m)~%2*Ka(z), 
Jo (x) ET EH f i (1 TH? exp (izz) dz, 
and 
Yo(zt) = = = {cos (ar) Ja(z) — J-o(z)}. 


Note that J,(-) and Y,(-) are the Bessel functions of the first and second 
kinds, respectively, of order m. 
Now consider (2.13) itself as a joint pdf 


fm(x) = C| x|” Km (lx ll), (2.14) 
where the normalizing constant C is 
gi-m—p/2 
C — r. 
(27)?/?T (m + p/2) 


Using properties of the MacDonald function K(-), (2.14) can be reduced 
to the simpler forms 


fR) = oF exp (- Il x Il); 


frais) = Cy Sex (- E seer ay HI 


1)! 
ha) = oy ree Px 


sot Fe (tet 
L«klin+k)i\ 2 


x {log (Het) — sul + &) — TARTEL 
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Takano (1994) established further that the joint pdf (2.14) is also in- 


finitely divisible and that its cf 
g(t) = (epee er” 


admits the Lévy representation 


g(t) = exp [cam +») l {exp (itTx )=1} Lpz Loza (WW a 


|| x ||P 


where Lp/2(-) is as defined above. 


oy] 
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Linear Combinations, Products, and Ratios 


3.1 Linear Combinations 


The distribution of linear combinations of independent t random vari- 
ables has been studied by numerous authors, among them Fisher (1935), 
Chapman (1950), Fisher and Healy (1956), Ruben (1960), Patil (1965), 
Ghosh (1975), and Walker and Saw (1978). Johnson and Weerahandi 
(1988) tackled the distribution of linear combinations for multivariate t 
random vectors. Their results are included here for completeness and to 
motivate further multivariate extensions. We hope that our readers will 
benefit from studying this material, which contains fruitful ideas and 
also refers to the original papers for further details. 

Chapman (1950) considered the difference D = X; — X2 of two inde- 
pendent ¢ random variables X; with common degree of freedom v. If v 
is odd, then it is known that the characteristic function of Y; = X;//v 
is 


dv(t) = Elexp (it¥;)] 
Virexp(— [tD "SX" (v= /2 +8) oy izt- 
2”-1T(v/2) z ee -1)/2-k)! ` 

(3.1) 


Using this representation, Chapman provided the following general ex- 
pression for the pdf of W = D/vv 


flu) = xf exp{-ilv + Dhed, 


which may be integrated to obtain the pdf of D in a closed form for values 
such as v = 1,3,5, and so on. Chapman tabulated the distribution of D 
for v = 1,3, 5, 7,9, 11. 
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Fisher and Healy (1956) considered the mixture D = a,X) + a2X2 
of two independent ¢ random variables X; with degrees of freedom vj 
when a; > 0, j = 1,2. It is obvious that the characteristic function of 
D is the product 


bur (a1 |t |) bv (a2 | #1), 


where ¢,(-) is as defined in (3.1). Since the product is simply a polyno- 
mial in t of degree (vı + v2)/2 — 1, it can be expanded in a finite series 
of terms of $m(-) in which the highest value of m = vı + v — 1. For 
example, in the special case vı = 3 and v2 = 5, one can write 


exp {(sin 9 + cos 6) t} 3 (sin 6t) ds (cos 8t) 


es (ene 


575 — 10V3tan6 + 9V5 tan? 0 ; 
eee [(V3 sing + V5 cos 8) t] 


ay (ages) « [ (v3sing + V5cos8) t] ; 


from this one can easily deduce the pdf of D. 

Ruben (1960) provided results on the distribution of D = X; sinĝ — 
X2cosĝ, when X; are independent t random variables with degrees of 
freedom v; and @ is a fixed angle between 0 and 90 degrees. This statis- 
tic was originally proposed by Fisher (1935) as the basis for testing or 
estimating the difference in means of two unconnected and totally un- 
known normal populations, the “fiducial distribution” of the difference 
between the latter quantity and the corresponding sample mean dif- 
ference, when suitably standardized, being supposed to be that of the 
statistic D. Ruben obtained the pdf of D in the integral form 


1 1 fı d?e? (s r (vı tve+1)/2 
o VM +v2B((vı + v2)/2, 1/2) vi +n 
(¥1/2)—1(4 — ş)2/2)-1 
8 s 
PO Bon A i 


where 


(v + v2)s(1 — s) 
(1 — s) sin? 6 + v28 cos? 80` 


p(s) = 
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It follows directly from (3.2) that D may be expressed in the form 
X 
#(S)’ 
where X is a Student’s ¢ random variable with degrees of freedom vı + 
v and S is a Beta variable with parameters 1/2 and v2/2, that is, 
a variable with pdf given by the second term under the integral sign 
in (3.2), with the first term under the integral sign representing the 
conditional pdf of X/7(S) for fixed S. 
In the special case v; = v = v and @ = 45 degrees, (3.2) reduces to 


2T(v + 1/2)P((v + 1)/2) lv+lvp d? 
= Aea o a a ak =i eric ae tomers 
f(a) vy MPu/Dw D 2 te 9 at a 
where >F denotes the Gauss hypergeometric function. By using the 
appropriate four of the group of 24 transformations of the hypergeomet- 


ric function 2F, (see, for example, Whittaker and Watson, 1952, page 
284), the above pdf may be expressed in the following three additional 


ways 
_ [ZT +.1/2)P((v +. 1)/2) Py” 
A = YS emne +S) 


l-v lv+2 @ 
alar ah W 
_  J2T(v +1/2)T((v + 1)/2) gN TCH 
a = Yo oari (+E) 


11 2 æ 
x oi (645.5 ) , (3.4) 


2 PFZ 
and 


_ [2T +1/2T(( + 1)/2) g2 \ Utne 
ey = ar (E) 


x oF l+v l-v v+2 d 
eg 2? 2 a Op 


(3.5) 


Note that (3.3) and (3.5) may each be expanded as a terminating series 
(refer to the definition of the Gauss hypergeometric function) when v is 
odd, and also that (3.4) is expressed as the product of the pdf of a t ran- 
dom variable with degrees of freedom 2v and the Gauss hypergeometric 
function. 
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In the special case vı = œ and v2 = vy, (3.2) reduces to 


T(v)0((v + 1)/2) fv +1 1 d 
-=R +a a 
V2r sin br (v/2)T (v + 1/2) 2 2 2sin’ 6 
where F; is the confluent hypergeometric function. Using Kummer’s 
first transformation for the confluent hypergeometric function, one can 
obtain the alternative form 
r 1)/2 d? 
o OAE DIONE M 
vV2r sin OL (v/2)L (v + 1/2) 2sin* 0 


x 1A (j+ t in) 
2° 2? Qsin?@) 
Ruben (1960) also provided expressions for the cdf of D, but these were 
infinite series involving incomplete gamma function ratios. For tables of 
percentage points of D, see Sukhatme (1938), Fisher and Yates (1943), 
and Weir (1966). 

Ghosh (1975) provided explicit expressions for the cdf in terms of sim- 
ple hypergeometric functions when D = X,~X2 and X; are independent 
Student’s t random variables with common degree of freedom v < 4. In 
particular, Ghosh obtained the following expressions for Pr(0 < D < d) 


tenn dg 
T 2J’ 


f(a) 


a {(1+q)B(q) ~ (1- )K(@)}, 
2/3d(18+d?) 1 d 
rape ee (aa) l 


and 


1 
——— ¢ (8p — 31p? + 48p? + 5p + 2) E 
ag è p’ + 48p° + 5p + 2) E(p) 
-t-10 + 98 + +2) 40}, 


for v = 1, v = 2, v = 3, and v = 4, respectively, where p = d*/(16 +d’), 
q =d°/(8+@), 


n/2 
E(x) = [ V1- zsin? sds 
0 
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is the complete elliptical integral of the second kind, and 


a /2 ds 
ro = fe 
0 1l — zsin? s 
is the complete elliptical integral of the first kind. Similar expressions 
for Pr(0 < D < d) as a finite sum of terms can be obtained for any 
positive integer v. Ghosh also provided a tabulation of the numerical 
values of Pr(D < d) for v = 1(1)20 and d = 0.0(0.5)10.0. 
Walker and Saw (1978) expressed the cdf of the linear combinations of 
any number of Student’s t random variables with odd degrees of freedom 
as a mixture of ¢ distributions. Define the linear combination as 


n 
ak 
D = 2 Xy, 
where a, > 0, ai +---+@, = 1 and X, are independent Student’s t 
random variables with degrees of freedom vz = 2m +1, Mmk = 0, 1,2,.... 


Construct a matrix Q whose element in row i and column j is the coef- 
ficient of exp(— | t |) | t |? in ġ2:+1 (t) (see equation (3.1)), that is, 


AN geii ifj=0,1,2. i i= 0,1,2.. 
“ 0 ifj >i. 


The characteristic function of D when vk = 2m, + 1 can be written as 
n 
olt) = E [exp (itD)] = J [ dn (art), 
k=1 


and since exp(| t |)@(¢) is a polynomial in | t |, one may obtain a vector 
A such that 


g(t) = exp(—|t|) SrA elF. 
k=0 


Walker and Saw (1978) showed that the cdf of D can expressed as the 
weighted sum 


S 
Pr(D <d) = >> neH(d), 
k=0 


where 


n? = ATQ, 
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n 
ym, 
k=1 
and 


Hild) = Pr [Xora < V2k+ id] 


This result can be used to calculate probabilities of D utiliizing only 
tables of the Student’s ¢ distribution. 

The general distribution of D = X, — Xə is very complicated when 
the X; are independent Student’s ¢ random variables with 4 # v2. It is 
therefore natural to ask whether a reasonably good approximation can 
be found. Chapman (1950) suggested the simple approximation 


d) x 
Pr(D <d) |e/ Jz 3 +]. 


where ®(-) is the cdf of the standard normal distribution. This idea is, of 
course, prompted by the fact that X14 — 2/ yri and Xayr — 2/ fra 
are both asymptotically standard normal random variables, approaching 
normality more rapidly than X; and Xz do. However, a few calculations 
show that this approximation is quite unsatisfactory even for moderately 
large values of vı > v2 > 2. Based on a t-approximation proposed by 
Patil (1965), an improved approximation is 


Pr(D<d) ~ T, (5) 


where T, is the cdf of the Student’s t distribution of v degrees of freedom, 


2 
(22 Ve 4 sin a) 


vo—2 vı—2 
v = 4+ cos4 v2 sin4 v? 
Pa + 1 
(2—2)? (v2 —4) (1-2)7@1-4) 


(where n > v2 > 4), and 


yee cos? v2 $ sin? v 
~ p—2 Wy-2 m-2/)’ 
where v; > v2 > 4. Ghosh (1975) derived another approximation that 
requires only tables of the normal distribution 


= d \ _ dg(d/v2) | Qı(d) ý Q0 + LW 
Pr(D <d) = +(<) PRD [aw cs 2 A 
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+0(4)} | 


where 
Qi(d) = (14+) (d +10), 
2 
Qo(d) = ae (3d° + 98d* + 620d? + 168) 
+a (d? — 10d* + 36d? — 456) , 
1+)? | 10 8 4 “nag 72 
Q3(d) = TAT (d'° + 66d? + 1016d® — 1296d* — 65328d? — 141408) 
A(L +A) a 20 8 4 2 
ENE — 58d — = 
err. (3d'° — 58d? — 280d* + 6864d* — 700324?) 
1277\(1 +A) 
256’ 


and À = v2/v,. Ghosh showed evidence to suggest that this is far more 
accurate than Patil’s approximation. 

Johnson and Weerahandi (1988) considered linear combinations of 
t random vectors in a Bayesian context. Suppose yi1,...,Yim, and 
Y21,---;Y2m, are independent samples from two p-variate normal pop- 
ulations N(yz,,5,) and N(p3, £2), respectively, where the population 
covariance matrices are unknown and not necessarily equal. Let y; and 
S; denote the corresponding sample mean vectors and sample covari- 
ance matrices. Johnson and Weerahandi considered the distribution of 
the quadratic form 


Q = (6-d)’V"(6-d), 


where ô = pt, — Hs, d = yi — Yo, and V is any p x p positive definite 
matrix. Note that yz; — y; have the central p-variate t distribution with 
covariance matrix S;/(m; — p) and degrees of freedom m; — p. Under 
the diffuse prior distribution 


P(Hy, 21, Mo, 22) = [Dil l[Zal, (3.6) 


the posterior cdf of Q can be expressed as 


FQ = LEM) Fan lag yay} P 


where n = mı +m, —2p and the expectation is taken with respect to the 
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beta random variable B, which is distributed as Be((mı — p)/2, (m2 — 
p)/2). The w; are defined in terms of @ (an arbitrary constant) and À; 
by the recursive relation 


where 


and Àj are the ordered eigenvalues À; < +-+- < Ap of the matrix 


l ys, v-2 4—1 


-1/2 -1/2 
mB mo(1 zp Sa 


In the particular case V = cS; + (1 — c)S2, the A; can be conveniently 
obtained by using the relation 


mə(1 — B) +m, BE; 


M4 = GamgB(L— B){e+ (1 — 08)" 


where 1,...,& are the eigenvalues of S7'So. In the univariate case, 
(3.7) reduces to give the posterior cdf of Y = (pı — p2) — (41 — Z2) as 


yVm, — mz — 2 


2 2 
mya) BE m E EA 
m, B mz 1-B 


F(y) = E Tm3+m2-2 


where s? and s2 are the sample variances and the expectation is taken 
with respect to B, which is distributed as Be((m, — 1)/2, (mı — 1)/2). 
The result (3.7) can also be used to deduce the pdf of U = (Tı + 
T2) (Tı T2), where the T; are independent p-variate random vectors 
having the ¢ distribution with covariance matrix (a;/m;)I, and degrees 
of freedom m;. It turns out that the cdf of U is 


u(m,+m2) B(1-B) ie 


F(u) = E |Fpm4m 
= [7 7 ( Pp a, + B (az — a) 


where the expectation is taken with respect to B, which is distributed 
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as Be(m,/2,m2/2). Johnson and Weerahandi also established several 
interesting bounds on (3.7), one upper bound being 


nq 
ra 


where 8; = V~!/2S,;V~—!/2. Furthermore, it was shown that similar re- 
sults hold if the diffuse prior in (3.6) is replaced by the natural conjugate 
prior distribution. 


Tne 1 


mB it m(l — Bj? 


Fha) < E 


3.2 Products 


The distribution of products of Student’s t random variables has been 
studied by Harter (1951) and Wallgren (1980). 

Products of Student’s ¢ random variables arise naturally in classifi- 
cation problems. In many educational and industrial problems it may 
be necessary to classify persons or objects into one of two categories 
- those fit and those unfit for a particular purpose. In formulating a 
classification problem, assume that for p tests one knows the scores of 
N; individuals known to belong to population II, and of Nọ individuals 
known to belong to population I2, along with those of the individual 
under consideration, a member of the population II, where it is known 
a priori that II is identical with either II, or Iiz. Assume further that. 
the distribution of the test scores of the individuals making up I]; and 
II, are two p-dimensional normal distributions, which possess the same 
covariance matrix but are independent of each other. In order to classify 
the individual in question into either I], or I2, Wald (1944) introduced 
the statistic V given by 


p p 
V = oa ak), whe, (3.8) 


i=1 j=1 


where n = N; + N: — 2, s” is the (i,7)th element of the inverse of the 
matrix S defined by 


1 n 
Sij = = XO Yi kYj,k 
n 
k=1 


and Yi, (i = 1,...,p; k = 1,...,2 + 2) are iid normally distributed 
random variables with unit variances and expected values E(Y;,,) = 0, 
k= 1, OERE E(Y;n+1) = f, and E(Yin+2) = u2. 
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In the particular case p = 1, (3.8) can be written as 


V = nrima ; (3.9) 


where 
1 > 3 
= = Yi vk: 
n 
k=1 


Thus, one sees that the V in (3.9) is a product of two independent 
Student’s t variates. Harter (1951) derived the exact distribution of this 
V. In particular, he showed that the pdf of | V | is given by 

1 


f(lv|) = ey Chen | v | +s+n/2) 


j=0 


if | v |> n/2, and by 
29 (-1)in"/ ti 1)Ín”/2+i 


_ exp ( exp (—43/2) 
fle) = ~ al (n/2) Lata (1+) 


k 

24+ 27 +n\ < (243) 2+n+2j 

2 perce ane et 
xT (ir P (2k)! PUk+ 7 


| v [7 0+i+n/2) 


if | v |< n/2. 

Wallgren (1980) studied the product of two correlated Student’s t vari- 
ates. This was motivated by hypotheses testing problems that assume 
that the relationship between two regression lines y = ai + fiz and 
Yy = az + Box hold for all real x. Although it is often true that such a 
relationship holds for all real x, there are instances in which the relation- 
ship may hold true only for z’s in a given interval, say [c1, c2]. Wallgren 
(1980) showed that the statistic for testing this hypothesis is a product 
of two correlated t-variates; specifically, it takes the form W = XY/S?, 
where (X,Y) has a bivariate normal distribution with means (11, #2), 
common variance g?, and the correlation matrix 


R = (| a (3.10) 


Moreover, 7.97/07 is independent of (X,Y) and has a chi-squared dis- 
tribution with degrees of freedom v. 


54 Linear Combinations, Products, and Ratios 


The limiting distribution of W = XY/S? as v > œ is Z = XY/o?. 
The distribution of Z has been studied by Aroian et al. (1978). The 
study of the distribution of W is, therefore, a generalization of the study 
by Aroian et al. since a? is unknown. . 

In the central case 4; = 0 and u2 = 0, Wallgren (1980) showed that 
the cdf F(w) = Pr(W < w) of W is given by 


F(w) = 1- [ Qu (650, a8 


for -1 <p < 0, w > 0; 
0 
/ Qu (0; p,w) dê 
a-T 


T+a 
Fw) = 1- f Q, (8; p, w) d0 
0 
for 1 > p > 0, w > 0; and 
0 
f enu d0 
Q 


vsinĝsin(0 +4) \"” 
w + vsin ĝ sin (8 + A) , 


/1 — 72 
a= uean (=F), 


p 


for -l<p<0,w <0; 


for 1 > p > 0, w < 0, where 


rQ (65,0) = ( 


and angle A is defined by sin A = 1-p?, cosA=pfor0<A<r. 
The corresponding pdf f(w) can be obtained by differentiation. The 
pdf has a singularity at w = 0 and, considered as a function of p and w, 
f(w; p) = f(—w;—p). The limit of f(w) as p > 17 is the F,,, density. 
Moreover, if pı < p2, then F'(w; pı) > F(w; p2) for any w. 

In the noncentral case, the cdf F (w) is given by 


T a h(s) æL h(s) exp (—v*/2) 


2 oy Nees 
x (ereraa AE æ) duds 


vl- 
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Fa [owe h(s) exp aoe) 


—ws? À À 
xo ee 2) +Ai + pv iuda, 
y1- 
where à; = i/o, i = 1,2 are the noncentrality parameters and 
v¥/*s¥—) exp (—vs?/2) 
2-2)/2P (v/2) 

The two double integrals above are bounded above by ®(A2) and ®(—2g), 
respectively. As a function of 4; and A2, F has the following properties: 


or, equivalently, 


h(s) = 


F (w; M, à2) = F(w;A2,A1), 
F (w; —à1,—à2) = F(w;à, A2), 
F (w; à, —à2) = F(w;—à, A2), 


lim F(wjcd2,r42) = Oifc>0, 
A200 


lim F(w;cr2,A2) = 1/2ifc=0, 


A200 


lim F(w;cd2,A2) = life<90, 


A200 
lim F(w;à, B) = &(-B), 
1700 


and 


lim F(w;\,,B) = &(B). 
1-00 
Also, for 43 > 0, àz > 0, w > 0, and —1 < p < 0, F(w;2j, 2) is a 
strictly decreasing function of A, and A2. Thus, the maximum of F(w) 
over the region A; > 0, à> > 0 occurs at A; = Az = 0. 
Since (X,Y) and S? are independent, the first two moments of W are 
given by 


E(W) = E(XY)E(1/S?) 
and 


E(W?) = £(X*Y’) £(1/S*). 
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It is known (see, for example, Craig, 1936) that 
E(XY) = oa? (A\A2+/), 


E(X°Y*) = of (AP +A + 4p\rrAo +AA +14 2p’), 
and i 
A vT (v/2—i 
(202) T 02) 
for v > 2X. 


3.3 Ratios 


For a bivariate t random vector (X1, X2) with degrees of freedom v, 
the mean vector (mz,m,) and correlation matrix Iz define the ratio 
W = Xı/X2. The distribution of this ratio is of interest in problems in 
econometrics and ranking and selection. Press (1969) derived the pdf of 
W as 


f(w) = eed ir aan (5) -a}], 


1+w? (py 
=œ < w < œ, (3.11) 


where T,4; denotes the cdf of the univariate Student’s t distribution 
with degrees of freedom v + 1 and kı, k2, q, and qx are constants defined 


by 
—v/2 
1 m2 +m? 
WE TE l 


Y 


_ yav tT ((v + 1)/2) mz, +m? iis 
ae 2T ((v + 2)/2) (1+ :) ' 


MW +My 


T E 


qe = m +m +v-g, 


respectively. In the special case mz = my = 0, (3.11) is the pdf of a 
Cauchy distribution. The asymptotic distribution of W as v — oo is 


and 
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that. of the ratio of independent normal random variables. This fact can 
be verified as follows. Let foo(w) = limpo f(w). Then it is easy to 
check that 


li 2 = 

v= (q) t! 2(q) 
mo Tk ai 
v= q* 

lim Fy4i(z) = (2), 
and 
: 1 
Jim kı = epf- (mè +m) b, 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution; thus, the limiting density fə is given by 


1 2@(q) -—1 1, . 
fot) = rr [tt g fee {oa in tm} 
which is the same density found by Marsaglia (1965, page 196, equation 
(5)) for the ratio of independent normal random variables with means 
(Mz, My) and unit variances. 

The density f(w) = f(w;mz,my) may be confined for positive values 
Mz > 0, My > 0 since (3.11) shows that 


f(w;-mz,my) = f(-w;mz,my), 

f (w; Mgr, —My) = f (w; Mz, My) , 
and 

f(w;mz,—my) = f(W; Me, My). 


Figure 3.1 shows how variations in (Mg, My, v) affect the shape of the 
density. It may be seen by reference to Marsaglia (1965) that the shapes 
are similar to that of the ratio of normal random variables, even for 
small values of v. 

The percentage points of W defined by Pr(W < w) = p are tabu- 
lated in Press (1969) for cumulative probabilities p = 0.01, 0.05, 0.10, 
0.90, 0.95, 0.99; v = 1, 2, 5, 10, 30; and for some 16 selected values 
of (mz,my). Here we provide the tables of w for (mz,my) = (0,0), 
(Maz, My) = (1, 3), (mz, My) = (3, 1), and (mz, My) z (3, 3). 
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Fig. 3.1. Densities of the t-ratio distribution (3.11) for (mz, my, v) = (0,0,1), 
(0,0,30), (1,3,1), (1, 3,30), (3, 1, 1), (3, 1, 30), (3,3,1), and (3, 3, 30) 
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Percentage points w for (mz, my) = (0,0) 


v | p=0.01 
1 | -31.820 
2 | -31.820 
5 | -31.820 
10 | -31.820 

-31.820 


p=0.05 p=0.1 


-6.314 
-6.314 
-6.314 
-6.314 


-3.078 
-3.078 
-3.078 
-3.078 


3.078 
3.078 
3.078 
3.078 


6.314 
6.314 
6.314 
6.314 


Percentage points w for (Mz, My) = (1,3) 


p = 0.01 
-10.254 
-5.791 
-1.331 
-1.041 
-0.681 


-1.794 
-0.902 
-0.382 
-0.312 
-0.256 


p=0.05 p=0.1 


-0.721 
-0.357 
-0.166 
-0.135 
-0.109 


p=0.9 p=0.95 p= 0.99 
31.820 
31.820 
31.820 
31.820 


p=09 p=0.95 p=0.99 


1.321 
1.120 
1.018 
0.951 
0.922 


2.394 
1.795 
1.486 
1.288 
1.211 


Percentage points w for (mz,my) = (3,1) 


p=0.01 
-51.268 
-58.076 
-64.675 
-67.570 
-69.742 


p = 0.05 
-8.970 
-10.187 
-11.427 
-11.984 
-12.406 


p=0.1 
-3.604 
-4.034 
-4.524 
-4.756 
-4.934 


10.853 
6.842 
6.464 
2.892 
2.350 


p=0.9 p=0.95 p= 0.99 


6.604 
7.337 
8.007 
8.293 
8.505 


11.970 
13.422 
14.774 
15.355 
15.788 


Percentage points w for (mz,my) = (3,3) 


v | p=0.01 
1 | -12.970 
2 -8.172 
5 -2.668 
10 | -0.118 


0.130 


-1.852 
-0.474 
0.229 
0.334 


p=0.05 p=0.1 


-0.442 
0.183 
0.425 
0.477 


2.242 
2.141 
2.038 
1.986 
1.943 


3.652 
3.205 
2.802 
2.627 
2.497 


54.267 
61.290 
67.984 
70.894 
73.074 


p=0.9 p=0.95 p= 0.99 


14.770 
11.028 
7.498 
6.019 
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Since it is evident from (3.11) that f(w) — 0 rapidly as mẹ and my 
become large, values of mz, My greater than 3 are not considered. 

The t-ratio distribution has one or two modes, depending upon the 
values of the parameters. The location of these modes are solutions of 
the equation 


Te (£vo+i) + Aw) (Lvr%3) = B(w), (3.12) 


* * 


where 
a 2 ee Erer 
SA (qt)? 3qw +m VI + w?’ 
—v/2 
Ba) =< 24 AW) wado Fayre ley} 
w) = a7 2 27T ((v +1)/2) m,V1+w? + 3qw 


—(v+2)/2 


Teram VFM {+a} 
Vat ((v+1)/2)  q* {ma V1 +w? + 3qw} 
x {w 1+w?- qme}, 


and q and q* are as defined above. Note that since T, may be expressed 
in closed form in terms of elementary functions, (3.12) yields the modes 
in terms of elementary functions only. 


From (3.11), 
lim q = -Mz 
wo 
and 
lim gq’ = 4/m2+v. 
w= oo 
Thus, 
lim w?f(w) = Constant. 


woo 


Hence the distribution of W can have no finite moments of order above 
zero. 

Kappenman (1971) extended Press’s ratio distribution for the multi- 
variate case by considering the joint pdf of WT = (X2/X1, X3/X1, ..., 
X,/X1), where X = (X1,...,Xp)? is a p-variate t random vector with 
degrees of freedom v, mean vector yz, and correlation matrix R. The 
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expressions for the joint pdf turned out to depend on whether p is odd 
or even. Introduce the following notation 


v? = (1,W7), 


M = V'R""V, 


K = -2V'R yp, 
L = v+p'R"p, 
_ L É 
= M 4M? 
and 
K 
b 2M` 


Then the expressions for f(w) are 


MEN "SX" E e E T 
fw) = 2 a2* pp 
Pl? (Ma)’*?)/? Tw) Z -1-2k 
ca —(v+p)/2 
<f u* {au? +1} v+P)/2 du 
if p is odd; 
2abP-1 |R|! v/T ((v + p)/2 
rice [R| ((v + p)/2) 


nPI? (Ma) PT (y/2) 
p/2 = si 
EEO” 


x f u2k-l {a?u? + ari du 
—b/a 


(p—2)/2 
p-l a\ 2k 
a2 Eor (3) 


k=0 


—b/a s 
x | u2k {a?u? ¥ ap eTa au 
0 
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if p is even and b < 0; and 


2ab?- RJT! W/T ((v + p)/2) 
aP M(¥+P)/2T (y/2) 


p/2 
p—1 ay 2k-1 
# b a a (5) 
foe) 
«| ykol {aw pa E di 
bja 
(p—2)/2 
p-1 ay 2k 
i 2 a) (5) 
b/a 
«| u fay? Jay O a 
0 


if p is even and b > 0. The integrals in these expressions can easily 
be rewritten in terms of the gamma and incomplete beta functions; see 
Section 3 of Kappenman (1971) for details. 


fw) = 


4 


Bivariate Generalizations and Related 
Distributions 


In this chapter, we shall survey a number of specific bivariate distribu- 
tions that contain Student’s ¢ components. 


4.1 OQwen’s Noncentral Bivariate t Distribution 


Let Yı and Y, have the bivariate normal distribution with zero means, 
unit variances, and correlation 1. Let vS? have the chi-squared distri- 
bution with degrees of freedom v and be independent of the X’s. Then 
Xı = (Yı +61)/S and Xə = (Y2 + ĝ2)/S have the noncentral univariate 
t distributions with degrees of freedom v and noncentrality parameters 
6, and 69, respectively. Owen (1965) studied the joint distribution of 
(X1, X2), a noncentral bivariate distribution. 
The marginal cdf of X;, j = 1,2 may be written as 


V 2T oe 


Pr(x; < E A 
r( j <y) T(v/2)2¢-2)/2 ô 


a’! g(x) d (= n ) dz, 
(4.1) 


where ø and © are, respectively, the pdf and the cdf of the standard 
normal distribution. Integrating by parts repeatedly, one obtains for 
odd values of v 


Pr(Xj<y) = ®(-6,VB) +27 (4;VB,A) 
+2[M, + M3 +--+ + Mv] 


and for even values of v 
Pr(X; <y) = &(-6;) + V2 [Mo + Mo +--+ Mv-2]. 
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Here, 


y+y?’ 
1 Pæt (+r? )h?/2} gz 
Qn Let ge 


(a function discussed and tabulated in Section 46.4 of Kotz et al., 2000), 
and the M’s are defined recursively by 


T(h,a) = (4.2) 


Mo = AVBY (3;VB) & (5;AvB), 


A 
M = B {5AM + A (6))\, 


Mz = Z {6; AM: + Mo} ; 


M = z [ô AM2 + Mı}, 


and 


-—1)B 
Mp = Ga38 {a,6;AMy_1 + Mxg-2} ; k > 4, 


where a, = 1/((k — 2)az_1), k > 2, and az = 1. Two special cases of 
(4.1) are 


Pr(X;<0) = ®(-6,) 
and 
Pr(X; <1) = 1-8(4), y=1. 
Also, if 6; = 0, then (4.1) is just the cdf of the Student’s ¢ distribution. 


Owen (1965) expressed the joint cdf of (X1, X2) in terms of functions 
of the form 


wind = ag | S eoe (4-0) a 
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and 


Q(y,6;R,0o) = RID ronan | [= ”-lo(x) ja (4-5) ae. 


For example, if one assumes, without loss of generality, that yı > y2, 
then the following relations hold 


Pr(Xi < y, X2 <y2) = Q(y1,61;0, R) +Q (ye, 52; R, 00) 
and 
Pr(Xi < yi, X2 < y2) = Pr (Xo < y2), 


for 6, > 62 and ĝı < 69, respectively. The formulas for Q(y, ô; 0, R) and 
Q(y, 6; R, œ) can be obtained by integration by parts. Since Q(y, ô; 0, R)+ 
Q(y, 6; R, 00) = Pr(X; < y), it is sufficient to know the formula for one 
of the Q terms. Owen obtained the following formulas for Q(y, 6; 0, R) 
for odd and even values of v, respectively 


Q(y,6;0,R) = &(R)—2T(R,(AR — ô)/R) 
-2T (5VB, (5AB — R) /B5) + 27 (5VB, A) 
+1 {5 <0}-1+2{ My +H, + Mj + Hs 
30 Me oe A -2} 
and 
Qly, 5;0,R) = ®(-6)+ v2n{ Mg + Ho + Mi + H 
oh M? 3+ H,-2}. 


Here, T(h, a) is as defined in (4.2) and the M*’s are defined recursively 
by 


Mj = AvBo(5VB) {8 (54vB) - © (54B - R)/ VB) }, 


Mi = B[sAM; + Ao (VB) {4 (sAvB) 
-¢ ((5AB - R)/ VB) }], 


B 
My = 3 {AM} +M} - Li, 
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k-1)B 
M; = C-IE Car AM; a + Mi-a} - Det, k > 3, 


and the H’s are defined recursively by 


Ho = —$(R)®(AR - ô) 
and 
Hy = ak+2RHk1, k>1, 
where 
Lp-1 = ak+2RLk-2, k>3 


with the initial value Lı = (ABR/2)¢(R)¢(AR — ô) and 


pee eee 
(k — 2)ak-1 ° 


with the initial values a; = a2 = 1. 


ar k>3 


4.2 Siddiqui’s Noncentral Bivariate t Distribution 


Siddiqui (1967) considered the joint distribution of Student’s ¢ variates 
when (Yii, Y2i), i = 1,...,N is a random sample from the bivariate 
normal distribution with zero means, unit variances, and correlation 
coefficient p. Let 


7 te 
m= gA Yia 
i=l 
J ee 
n = ah 
i=1 
2 1 ` z)? 
Si = Jah) , 
i=1 


4.2 Siddiqui’s Noncentral Bivariate t Distribution 67 


and 
i S E = 
gee X (Yu - MN) (Yaa -¥2). 


i=l 
The interest is in the joint distribution of the Student’s t variates 
ý Yo ) 
X, X) = ( ——, —— ]. 4.3 
ee) Ta VN — 18, oe 


It is well known that the joint pdf of (Yi, Yo, S1, S2, R) is 


f (Hi, G2; $1, 82, r) 
NY (si82)¥? (1 - r2) 9” a fz +s 
T(N — 2) (1 = p) ~P P|- 20 =p) 


+9” + s2 — 2p (9 +175) 52) | 


for —00 < Jı < œ, —00 < z < 00, 0 < sı < œ, 0 < s2 < œ, and 

—1 < r < 1 (see, for example, Kendall and Stuart, 1958). After suitable 

transformation, Siddiqui obtained the joint pdf of (X1, X2, R) in the 

form 

rw +2) (1- pyre (fs r2) 072/2 
(2n)3/2T (v + 3/2) (1 — b — cr)” t! 


2 2 —(v+1)/2 
«{0+2) +3) 
Vv V 


11 3 1+b+er 
x 20; (pprt yoy), (4.4) 


f (a1, 22,7) 


where v = N — 1, oF; is the Gauss hypergeometric function, l 


PT1T2 


z? T2’ 
Vay Beaty da 


fee es 
x? z2 
yore NEE 


It is easily seen that the limit of this joint pdf as v > oo is trivariate 
normal with independent components (X1, X2) and R. Integrating out 


b= 


and 
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r in (4.4), Siddiqui showed further that the asymptotic joint pdf of 
(X1, X2) as v = œ becomes 
Tr(v + 2) (1- yore 


f (1, 22) ~ “Soro ee 


2 2 —(v+1)/2 
«{0+2) +2) 
v 


11 31-B+e 
F =; =). 

Sa (Gs PtP Ao ) 
The exact joint pdf of (X1, X2) was also given for the two special cases 
v = l and v = 3. For v = 1, the joint pdf reduces to a bivariate Cauchy 
distribution 

(1 — p?) csc? 8 T 

‘ = OO +- tO>, (4. 
an) = ETB Trapt + (5-6) cote}, (4.5) 


where 
2p (1 — p°) (1+ yrye) 
JVlty?J/1+y2 


In fact, if p = 0 in (4.5), then one arrives at a product of two independent 
Cauchy densities. For v = 3 the joint pdf is 


32v2(1—9?)" eN (BY 
f(t1,22) = (+ uy (+2) Is, 


cos@ = 


where 
T'(9/2)0'?(k + 1/2) l fk 
-È P(k +9/2)r?(1/2)k! & S 5-21 mai (- z) (7) 


x fa -b — c) 5/2 —(1-b+ e l 


4.3 Patil and Liao’s Noncentral Bivariate t Distribution 


Patil and Liao (1970) provided an extension of Siddiqui’s work when 
(Yii Yo:), i = 1,..., N is a random sample from the bivariate normal 
distribution with zero means, common variance o”, and correlation co- 
efficient p. Instead of considering the joint distribution of (4.3), they 
considered the joint distribution of 


JNY, | 


(X1,X2) = (F 3 (4.6) 
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where S is the pooled sample standard deviation defined by 


aes (Yi = ñ)? + pane (Y2: z P)’ 

2N -1 f 
Exact expressions were given for the joint pdf and cdf of (X1, X2) and 
for the corresponding marginal distributions. For instance, if N is odd 
and equal to 2q + 3, then the joint pdf of (X1, X2) can be represented 
in the form 


f (a1, 22) : 


Beat a(q + 1) (1p?) T2(q + 1) 


x 5 (o) (-1)9-* eee T (2q -k +1) 


p 


k 
2_9 2) ~(k+2) 
2(1+p) 8(q+1) (1-9?) 
2 


q—k l 
p T(k+142) 1 
7 2 (27) l! ES 


-(k+1+2) 
y? — 2pyrye + y? | 


*8q+ (1-2) 


while if N is even, then the joint pdf of (X1, X2) becomes 
(1 +o- 


fns) = NaN a -anD 
= 1— N)/2 : 
E(w ero fha} 
1 Ta o 
2(1+p) 4(q-1)(1-p?) 


4.4 Krishnan’s Noncentral Bivariate t Distribution 


Krishnan (1972) provided another extension of Siddiqui’s work when 
(Yii, Yoi), i = 1,...,N is a random sample from the bivariate normal 
distribution with means (y, ô), unit variances, and correlation coefficient 
p- She considered the joint distribution of 

VNI VN- vi) 


(4.7) 


a al ( Sı i S2 
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where Sı and Sp are correlated chi-squared random variables indepen- 
dent from Y, and Yo, respectively. Series representations were derived 
for the joint pdf and the cdf of (X1, X2). One of the representations 
given for the joint pdf is 


_ B o0 i a z? —(N+j)/2 
feue = Bafe (is a) 0+ 


j=0 


x dm,j» (4.8) 


where 
2(1—p?)’ 


E / A3 NN 24-N p3-N j A 
B = A t-A +ô — 2p7ô) }, 


a Vk (2pAz1£2)7 
(N = DG = k)! 


PE D o a J AEII (N +k = 1)/2)}, if k is even, 
f if k is odd, 


O (2f (2A) (NN +i+j N+j+m-i 
tos = Do eee a S 


VA (pô — 7) a1 
JN -1+zr? 
and 


VA (py — ô) 22 
JN —14+23— 


In the central case y = 6 = 0, (4.8) reduces to the form derived by Patil 
and Liao (1970). 


4.5 Krishnan’s Doubly Noncentral Bivariate t Distribution 71 


4.5 Krishnan’s Doubly Noncentral Bivariate t Distribution 


If Y is a normal random variable with mean ô and unit variance, and S? 
is an independent noncentral chi-squared random variable with degrees 
of freedom v and noncentrality parameter À, then 
Y yv 

X = $ (4.9) 
is said to have the doubly noncentral univariate ¢ distribution with de- 
grees of freedom v and noncentrality parameters 6 and À. The properties 
of this distribution have been studied by several authors; see Robbins 
(1948), Patnaik (1955), Krishnan (1959), Krishnan (1967a), Bulgren and 
Amos (1968), and Krishnan (1968) — see also Chapter 31 in Johnson et 
al. (1995) for a summary. The pdf, the expectation, and the variance of 
X are given by 


e ee M exp {— (A + 6?) /2} 
fla) = DD. EU) Be U2 + 72,0) +B) 


ôr \'! g2 \ T(t +2k+1)/2 
x | = 1+— , 
(æ) (+7) 


k=0 1 


and 


Var(X) = (1+8?) — iF, (1.5:-3) = {ECO}, y>2, 


where F; denotes the confluent hypergeometric function. 

A bivariate analog of (4.9) was defined by Krishnan (1970) as follows. 
Let (Y1, Y2) follow a bivariate normal distribution with zero means, unit 
variances, and correlation coefficient p. Let (S1, S2) follow independently 
a noncentral bivariate chi-squared distribution with degrees of freedom 
v, noncentrality parameter À, and correlation coefficient p (Krishnan, 
1967b). Then the random vector 


(4.10) 


is said to have the doubly noncentral bivariate ¢ distribution with degrees 
of freedom v and noncentrality parameter À. Krishnan (1970) derived 
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the corresponding joint pdf of (X,, X2) and provided an application in- 
volving the sample means and variances from two correlated nonhomo- 
geneous normal populations. The special case of (4.10) for Sı = S2 = S 
was considered by Patil and Kovner (1969), who provided expressions 
for the joint cdf of (X1, X2) and showed that when the means of Y; 
are zero the probabilities of (X1, X2) in rectangular regions are mono- 
tone functions of p. In the special case Sı = Sp = S and à = 0, the 
distribution of (X1, X2) reduces to that of the central bivariate t. 


4.6 Bulgren et al.’s Bivariate t Distribution 
Suppose Yj,..., Ym, Ym+1,---) Ym+n denote iid normal random variables 
with common mean p and common variance o?. Bulgren et al. (1974) 
considered the joint distribution of (X1, X2) defined by 


mY, J/m + nY 


ey VSP? , [m= SF4(n-1) 2 
m+n—2 


where 


and 


oe SS (yey). 


The distribution of (Xi, X2) is bivariate ¢ with a different noncentrality 
parameter for each variable. Note also that X, and Xə have, respec- 
tively, m — 1 and m + n — 2 degrees of freedom. Bulgren et al. (1974) 
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provided series ‘representations for the joint pdf of (X1, X2). In the 
central case u = 0, 


g(m+n)/2 z2 —(m+n)/2 
fena) = “To (14+ 5) 
“(-1) (mtn. x2 z 
xy ji T 9 +j + aa 
j=0 
BS pees k/2 
2j m+k n-1 n+m 
D a (Cea) 
k=0 
2j—-k 
xak -vmt , (4.11) 
n(m +n — 2) 
where 
A = Xmm (mma Ulm +n—2), (mal), (mal) 
m+n 2 2 


In the noncentral case u Æ 0, the joint pdf is even more complicated. 
Letting n = am, a > 0, and taking m — oo in (4.11), one observes that 
the limiting distribution in the central case is the bivariate normal dis- 
tribution with zero means, unit variances, and correlation y 1/(1 + a). 


4.7 Siotani’s Noncentral Bivariate t Distribution 


Siotani (1976) considered the most general form of (4.6) introduced by 
Patil and Liao (1970). Let Y be a p-variate normal random vector 
with mean vector yz, unit variances, and correlation matrix R. Let 
S = J(V? + V¥)/(2v), where (Yi, V2) has the central bivariate chi- 
squared distribution with degrees of freedom v and correlation coefficient 
T. Siotani derived the distribution of X = Y/S for general p and R. The 
derivation required the joint pdf of (Vi, V2) that was given by Siotani 
(1959) in the form 


os) vz) aa 1 1 
f (v1, v2) Bee e TE (CEST) 
exp d-n seus l ; 
2(1-7?) 
where 
T ((v + 2k)/2) 


ck(T) EDN (1- py? ek (4.12) 
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From this, one can easily obtain the pdf of 


We S _ Y+V 
VIS Vj 2v(1—7?) 
as 
= Dalry T) fov+4an(w), (4.13) 
where 
2a)" encag = 
fov+4k(w) Pep 2u+4k—-1 exp {-vw?} ë (4.14) 


Since c (7) +++++0¢ (7) = 1, (4.13) is a mixture of (4.14) with the 
weights given by (4.12). Thus the joint pdf of X is also obtained in the 
same form 


f(x) = J lT) Sia (x), 
k=0 


where c(T) are given by (4.12) and 


} I (v + 2k + p/2) (1 —7)?/? 
(2vr)P/T (v + 2k) |R|? 


—(v+2k+p/2) 
x™R x} 


kk 1 
fpa (X) = ex {-5 A 
1 
2v 


H 
xfi 
T (v +2k + (p+l)/2) 
l= 


. 2 ID (v + 2k + p/2) 


k 
3 2(1-7?)xTR tu 
Qv + (1—7?)xTR-x | 
When p = 2, p = 0, and p = 7 (p is the correlation coefficient between 
^ and Y2) this coincides with the pdfs derived by Patil and Liao (1970). 


4.8 Tiku and Kambo’s Bivariate ¢ Distribution 


Suppose (X1, X2) has the bivariate normal distribution with means 
(mı, #2), variances (o?, g2), and correlation coefficient p. Its joint pdf 
can be factorized as 


f(t1,%2) = f (1 | 22) f (22), 
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where 
f (zı | z2) = aS |- 7? afa — fh 
a (z2 — pro)? H (4.15) 
and 
fa) x epf- e-m). (4.16) 


Numerous nonnormal distributions can be generated by replacing either 
f(z. | £2) and/or f(z2) by nonnormal distributions. Tiku and Kambo 
(1992) studied the family of symmetric bivariate distributions obtained 
by replacing (4.16) by the Student’s t density 


f(r) x —— 1 Ea (4.17) 
s vVkoz kos l 


where k = 2v — 3 and v > 2. This is motivated by the fact that in many 
applications it is reasonable to assume that the difference Y; — pı — 
p(o,/o2)(Yo — u2) is normally distributed and the regression of Y, on 
Y is linear. Moreover, in numerous applications Yz represents time-to- 
failure with a distribution (Tiku and Gill, 1989; Gill et al., 1990), which 
might be symmetric but is not normally distributed. Besides, most 
types of time-to-failure data are such that a transformation cannot be 
performed to impose normality on the underlying distribution (see, for 
example, Mann, 1982, page 262). 
On replacing (4.16) by (4.17), the joint pdf of (X1, X2) becomes 


2) TY 
f(e,e2) = eras | 


0102.\/k (1 — p°) ko? 
xex -yai — py — T (z — m)? 
P 202 (1 — p) 17M Pos 2— H2 . 
(4.18) 


Limiting v —> 00, (4.18) reduces to the bivariate normal pdf. Writing 
lij = E((Yi — m)Ż (Y2 — u2)Ż) for the cross product moment of order 
i + j, one observes that all odd-order moments are zero and that the 
first few even-order moments are 


76 Bivariate Generalizations and Related Distributions 


Hil = p002, 
Ho = oz, 
Ho = 304 (1+ 2 5) ; 
u31 = 3p0?oz (1 + a :) j 
uz = ooz (1 + 2p? + we) i 


2 
p13 = 3po103 (+z + E ra) 


and 
_ 3(2v— 3) 4 
Ha = pap o. 


In fact, the moment generating function (mgf) of (Y1, Y2) is given by 


t= 2 2 
E [exp (0,X1 + 62X2)} = exp (m - Pr pa) a+ Cig) 


M2 (+. + Pt agn) , 
02 


where M2(-) denotes the moment generating function of X2. This mo- 
ment generating function does not exist unless, of course, v = co. How- 
ever, the characteristic function does exist and is given by Sutradhar 
(1986). Estimation issues of the distribution (4.18) are discussed in Sec- 
tion 10.1. 


4.9 Conditionally Specified Bivariate ¢ Distribution 


Let (X,Y) be a continuous random vector with joint pdf fx y (x,y) over 


R?. Let fx (x), fy (y) and fxjy(z|y), fyıx(y | z) denote the associated 
marginal and conditional densities, respectively. Assume that X | Y and 
Y | X are Student’s t-distributed with the pdfs 


fxiy(z|y) = a rT voly) {1+ of (y)a2p OTN? (4.19) 
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and 


faxu) = OD Vata) (1+ rayne} OY” (4.20) 


where x E€ R, y € R, v > 0, and a(y), T(x) are some positive functions. 
Writing the joint pdf of (X,Y) as a product of marginal and conditional 
densities in both possible manners, one obtains 


fy (y)V oly) {1 + oly) y tt 


= fx(£)y T(z) {1 + r(z)y2 y t, (4.21) 
where z € R and y € R. Set 
/(v+1 v+l1 
atv) = {fre}, ha = [flere (4.22) 
so that, after rearranging, (4.21) becomes 
gly) +97 9(y)t(z) — h(x) —27h(z)o(y) = 0, (4.23) 


which must be solved for ø, T, g, and h. Kottas et al. (1999) recognized 
that (4.23) is a special case of the functional equation 


nr 
So felz)ge(y) = 9, 
k=1 
whose most general solution is given in the classical book by Aczel (1966, 


page 161). Thus, with h(x), z?hk(z) and g(y), y?g(y) being the systems 
of mutually independent functions, the solution of (4.23) is found to be 


2 Ag + Naz? Ag + Aay 
(z) ~ Ar a AoT?’ o(y) a `i +4 Agy2 (4.24) 
and 
1 1 
(z) a Ài Ex dor?’ gly) Ba `i + ay (4.25) 


for 4; E€ R, j = 1,2,3,4. Finally, substituting (4.22), (4.24), and (4.25) 
into (4.21), the joint pdf is derived as 


fx xy(z,y) = N(A1, 2, As, Aa) 


x {Ar + doz? + dsy? + Agsrye yer? 


, (4.26) 
where z € R, y E€ R, and N,(-) denotes the normalizing constant. Uti- 
lizing certain compatibility conditions given in Arnold and Press (1989), 
Kottas et al. found that (4.26) is a well-defined joint pdf if A; € Ry U{0} 
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and A; E€ R4}, j = 2,3,4. Moreover, if A; = 0, then one must have 
v € (0,1). 
The normalizing constant is given by the integral 
1 
N, (ài, À2, A3; Ma) 


= / f (Ar + Age? + Agy? + Agn?y?) CP dedy. (4.27) 


In the case \; # 0, making the transformation s = (à2/à1)z?, t = 
(A3/A1)y?, letting ¢ = ,Aq/(AgA3), and using the integral representa- 
tion of the Beta function, 


B(a,b) = I z711 + 2)~*'dz, a>0, b>0, 
0 


one obtains 


1 B(33) [7 dz 


Nv(A1,A2,A3,A4) MY-DI2TXIX3 Jo (1+ 2)"/2/a(1 + gz) 


Letting w = «/(1+ s) and manipulating, Kottas et al. obtained 
N, (Ai, A2, A3, Aa) = 
where 
1 
I(a,b,ce;2) = I we (1—w)e (1 — zw)™°dw (4.29) 
0 


for c > b > 0. In the case à; = 0, similar arguments show that 


p/2\(l-v)/2 
N,(0, A2,A3,A4) = MAVA 
B (3,3) B (15%: 3) 
where 0 < v < 1. The integral (4.29) converges for z < 1. For | z |> 1, 
Kottas et al. provided an alternative representation of (4.28) in terms 
of the Gauss hypergeometric function (see, for example, Magnus et al., 
1966, page 54). It is also possible to represent (4.28) in terms of elliptical 
integrals of the first and second kind (Carlson, 1977, Chapter 9). For 
example, if v = 1, then (4.28) can be easily rearranged to yield 


VAIN 


Nz (Az, A2, A3,A = =, 
Ainanao V2aRr (0, 1/¢, 1) 
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where Rp is the elliptical integral of the first kind defined by 
1 CO 
Rr(a,b,c) = J f {(z + a)(z +b)(z + oy? dz 
0 


with a, b, c nonnegative, and at most one of them equal to zero. 

If 0 < v < 1, then (4.26) does not possess finite moments; thus, 
from here on we shall consider the case v > 1. If v > max(m,mn), for 
non-negative integers m, n, then Kottas et al. showed that 

m+1 vr=m m+) n+l vl. 
pees) = e ae e 
mur HET (3) T (3,91 - 4) 
provided that both m and n are even or zero. The expectation is zero if 
at least one of m or n is odd. This suggests that the distribution may 
be an appropriate model for uncorrelated but nonindependent data. 

From relations (4.21), (4.24), and (4.26) it is immediate that the 

marginal densities are 


fx(x) = vin {I (pa -0) V1 + oj 22(1 + ma) 


and 
11lv+l, ; ey ie 
fru) = Veal Siok z lT? vV 1+ dpoy?(1 + pzy“) ; 


where z € R and y € R. Here, u; = Aj41/A1, j = 1,2 are the intensity 
parameters while ġ and v are the dependence and scale parameters, 
respectively. It is easily noted that X and Y are independent if and 
only if ¢ = 1. The graph of the joint pdf is symmetric and bell-shaped 
and takes the standard form when 4, = w = 1. 

From relations (4.19)—(4.20) and (4.24)-(4.25) it is immediate that 
X | Y has the Student’s ¢ distribution with degrees of freedom v and 
scale parameter (1/)(1 + p2y”)/(1 + ġu2y?), and that Y | X is also 
Student’s ¢ with degrees of freedom v and scale parameter (1/v2)(1 + 
1 2")/(1 + oj 2), where u; = Aj41/A1, j = 1,2. Consequently, the 
conditional moments are 


-1 


ities cE Ef aie e 
BU Oey) = are Val ae 
and 
amane DES iy ae NOY 
OCA ih 2 xray (mares) 
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provided that m is an even number less than v. If m is odd, then the 
corresponding conditional moments are zero. 

In the special case v = 1, (4.26) reduces to the centered Cauchy condi- 
tionals model of Anderson and Arnold (1991). The limiting case ¢ > 0 
gives the bivariate Pearson type VII distribution (Johnson, 1987, page 
117) with location parameters equal to zero and uncorrelated compo- 
nents. If 2v is a positive integer, then this limit distribution reduces 
to a special case of the general bivariate ¢ distribution (see, for exam- 
ple, Johnson and Kotz, 1972, page 134, relation 1) with uncorrelated 
components and 2v degrees of freedom. For pı = fig and v = 2, the 
limit distribution reduces to the bivariate Cauchy distribution (Mardia, 
1970a, page 86) while for pı = p2 and v = v +1 it gives the bivariate 
t distribution (Johnson and Kotz, 1972, page 134, relation 2) with v 
degrees of freedom. In the latter case, the standard bivariate normal 
distribution with independent components arises as a further limiting 
case when v —> oo. Other special cases of (4.26) are the centered normal 
conditionals model studied by Sarabia (1995) and the Beta conditionals 
model of the second kind (Castillo and Sarabia, 1990). 


4.10 Jones’ Bivariate t Distribution 


Let Z,, Z2, W be mutually independent random variables with Z; hav- 
ing the standard normal distribution and W having the chi-squared dis- 
tribution with degrees of freedom nı. Then the standard bivariate t 
distribution with degrees of freedom n, is the joint distribution of 


(F ve (4.31) 
One disadvantage of this model is that the two univariate marginals 
(which are Student’s t) have the same degrees of freedom parameter 
and hence the same amount of tailweight. Jones (2002b) provided an 
alternative distribution with Student’s £ marginals, each with its own 
arbitrary degrees of freedom parameter. Precisely, if Wi, W3 are in- 
dependent chi-squared random variables (also independent of Z1, Z2) 
with degrees of freedom vı and v2 — 4, respectively, then Jones (2002b) 
considered the joint distribution of 


_ (Wnt vireo 
ae) <= (“See ieee) oe 


Note that the ith marginal of this distribution is Student’s t with degrees 
of freedom v;. It is easy to see that the correlation between X, and X2 
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is zero, a property also shared by (4.31). If rı < vı and rı + 1r2 < n2, 
then the product moment is given by 
yp Pur (H) r (ES) r (45%) r (a=) 


aD (3) 0 (277) 


E(X X3’) 


if rı and rz are even and is zero otherwise. The joint pdf of X, and X> 


2 2\ —(l1+v2/2) A 
f (z1, 22) = C 14% Z2 F Pa “i, 
vı V2 2 9 
2 2 2 
neigia) (4.33) 
2 vi nı V2 


where 
Re ris) 
vanr (3) T (2) 
and 2 denotes the Gauss hypergeometric function. The conditional 
pdf of X2 given X; = z1 is 


C= - 
T 


z2 


f (z2 | z1) = c(u+2 


Vg h — n lt+yv uy—1 


yer" 


2 2 2 2 ut 23/2 
(4.34) 
where u, = 1+ 2?/™ and 
(+ 2 py 
ems aia e. 
TT (152) 
If v2 + 1 >r, then the conditional rth moment is given by 
£ =V: Vv T ltr r ity = 
E (xs | Xı = zı) = v3 ulf 2+ o ECT be mal 
Var (5%) 
EF V —r+1 v—n, 1+ u -—1 
241 2 , 2 , 2 , Uy 
(4.35) 
if r is even and is zero otherwise. Setting v2 = vı = v in (4.34)- 


(4.35), one obtains the corresponding forms for the standard bivariate 
t distribution (see Section 1.11). Note that the conditional variance 
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Var(X2 | Xı = 21) increases with | zı |. In a parallel fashion, with 
uz = 1 + 23/v2, the conditional distribution of X, given X, = z2 is 


2\ —(+/2) 
f(a, |.) = c(u+2) 
1 


Vy hn 1lt+yv ug 
«oF (14 2 V2 1, Qe 2 J 


g go> ag uz + £2 /1 

where 

ure) p (a)r (2) (1+ 4) 
vnr (3) r (52) 

This time, the conditional rth moments exist provided vı > r, unless 

vı = V2, in which case one needs 1 + vı > r. The odd conditional 


moments are again zero and the even conditional moments are given by 
the simpler form 


C = 


T (+47) T (22) T (22) T (=t) 
— — _t/2,7/2 2 2 2 2 

E(Xj|X2.=%2) = vu Vat (4) 2 (282) r (45) 

The construction (4.32) can be easily extended to the multivariate 


case. Two straightforward extensions are 


e Let Z,...,Z), Wi,...,W, be mutually independent random vari- 
ables with Z; having the standard normal distribution and W; having 
the chi-squared distribution with degrees of freedom v; — v;-,. Then, 


G aena 


Wi > J/Wi+W2 YW t Wp ; 


has a multivariate distribution with univariate marginals that are Stu- 
dent’s ¢ distributed with degrees of freedom v;, i = 1,...,p. The 
bivariate marginals of (4.36) have the distribution of (4.32). 

e With the notation as above, 


(Xi X3,- T ,Xp) 


_ [vZ yr Yp Zp (4.37) 
W, YW +02” JW, +U, l 


has a multivariate distribution with the same univariate and bivariate 
marginals. Here, U; are independent chi-squared random variables 
(also independent of Z;, W;) with degrees of freedom v; — 4, i = 


laser D: 
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(a) (b) 


x2 


Fig. 4.1. Jones’ bivariate skew t pdf (4.33) for (a) vı = 2 and v2 = 3; and (b) 
vı = 2 and v2 = 20 


Further extensions of (4.32) arise by adding further independent chi- 
squared random variables inside the square roots in the denominators 
of all variables in (4.36) or by adding a single further independent chi- 
squared random variable inside the square root in the denominator of 

1 in (4.37). 

Jones (2002a) provided another bivariate generalization of (4.31). This 
generalization has the skew ¢ distribution (Jones, 2001a) as its marginals. 
If U denotes a standard beta random variable with parameters a and c, 
then a skew ¢ variate is defined by 


Va+ c(2U — 1) 
a/U0—0) 


X 
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The corresponding pdf is 

210c (a + ¢) z anh? 
ee + ay 

Ja + (a)r (e) l vVa+c+r? } 


m c+1/2 
x {1 + ===} x (4.38) 


The standard Student’s t is the particular case for v = 2a when a = c. 
If a > c, then (4.38) has positive skewness; also, f(z;b,a) = f(—z;a, b). 
Further details about (4.38) are given in Jones and Faddy (2002). The 
bivariate generalization proposed in Jones (2002a) is constructed in the 
same way as (4.38): Specifically, if (U,V) denotes a Dirichlet random 
vector with the joint pdf 

T(a+b+c) 
TORORON 


(where u > 0, v > 0, and u +v < 1), then define 


(XX) = Vd(2U -1) Vd(1-2V) (4.39) 
ae 2/U0—U)’ 2U -U)) 


where d =a +b +c. It can be verified that the joint pdf of (X1, X2) is 


f(z;a,c) 


a Tee 


fluv) = 


l1- u-v)! 


a—l 
= dd+1) z 
feon) = ETOO (1+ a) 


6-1 
ately on ES FA 
Jd+22 (d+ 2)?’ (d+ z2)?” 


c—1 

T2 Ti 
x ( Jaa ais) ; (4.40) 
Because of a direct analogy with the Dirichlet distribution, only one of 
the two marginals of (4.40) can be a symmetric Student’s ¢ distribution, 
the other necessarily being skewed. This Student’s ¢ distribution will 
have degrees of freedom d, and any skew ¢ marginal will have a total 
parameter value of d, but divided up into unequal amounts. In this 
sense, marginals of (4.40) are most closely associated with Student’s ¢ 
distributions with degrees of freedom d. 

Note that if instead of (X1, X2) in (4.39) the transformation was made 
to (—X1, X2), then one would have obtained the equivalent distribution 
on zi +22 > 0. Also, (-X,,-—X2) would have given the equivalent 
distribution on z} < zı and (X1, —X2) as the same on z2 + z; < 0. 
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(a) (b) 


0 


x2 
x2 
-8 -4 


-14 


x2 
x2 


02468 


xi x1 


Fig. 4.2. Jones’ bivariate skew t pdf (4.40) for (a) a = 1, and c 
a = 3, b = 4, and c = 5; (c) a = 5, b = 1, and c = 1; and (d)a = 1, b 
c=1 


The corresponding changes to (4.40) would simply have been to make 
corresponding changes to the signs of zı and z2. 

The means and variances associated with (4.40) can be easily obtained 
from the results provided in Jones (2001a) 


„~ _ Vd T (a—1/2)T (b +c- 1/2) 
BAD Eeg nR Tae +c) i 
_ va T (a+c-1/2)T (b -— 1/2) 
E(X2) = g Obeo aro 


_ d(a-b-c)?+d-2 
Var(Xı) = Tue- jery EWF. 
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and 


d(a—b+c)*+d-2 2 
Var(X2) = =~ —_— - {E(X : 
ar CG) 4 (a+e—1)(6—-1) {E (X2)} 
The covariance between X; and Xz appears not to be available in closed 
form. 


5 


Multivariate Generalizations and Related 
Distributions 


This chapter contains a large number of modifications and extensions 
of the standard multivariate t distribution introduced in (1.1). Some of 
them are of somewhat complex nature. It thus requires a careful reading 
to see the forest, behind the trees! 


5.1 Kshirsagar’s Noncentral Multivariate t Distribution 


One of the earliest results in the area of noncentral multivariate ¢ distri- 
butions is that due to Kshirsagar (1961). Let Y be a p-variate random 
vector having the normal distribution with mean vector 4, common vari- 
ance o”, and correlation matrix R. Let S? be distributed independently 
of Y according to a chi-squared distribution with degrees of freedom v. 
Kshirsagar (1961) considered the distribution of X = Y/S and showed 
that it has the joint pdf 


2 af tet pe) Ew 0)/2) 


1 —(v+p)/2 
x fı + Lerx} 
v 


ELU trto | VRE i ‘et 
k=0 kT ((v+p)/2) | vv +xTR x] oo” 


where € = y/o. This noncentral distribution reduces to the form of 
(1.1) when p = 0. 

We noted earlier in Section 1.11 that, if X has the central multivariate 
t distribution with degrees of freedom v and correlation matrix R, and 
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and 


where X; is pı x 1 and Ry is pı X pı, then 
Zı =X 


-1/2 
v+ z 
Z: = (1 + EXTRI ixa ) (Xe Ra Rj}X1) 


are independently distributed, each according to a central multivariate 
t distribution. This result does not remain true to the noncentral distri- 
bution (5.1). Actually, Siotani (1976) showed that if 


oie ae ) 
=. 
is the partition of € corresponding to that of X, 
E21 =é — RaR é 


and 


and 
R221 = R22 — Ra Ri Rio, 


then the joint pdf of Zı and Ze is 


1 _ 1 =, 
f(z1,22) = Kexp (-5ePRie, = Rala) 
xty (Z1; Rai, pi) typ; (22; R221; P — pi); 


where the last, two terms denote the pdfs of central multivariate t distri- 
butions with appropriate parameters and K is given by the formidable 


expression 
- Sy Peters DP) (2) (a) 
k 


zi Ri TE 


yl+2t Rī} zı /v 
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x% (27 R21) 
{1 +27 R3212/(v + pi) 


Siotani (1976) also derived the corresponding noncentral distribution 
when X is partitioned into k sets of variates as in (1.20). Following the 
notations defined by (1.21), (1.22), (1.23), (1.24), and (1.25), the joint 
pdf of 


(k+1) /2 
} 


and 


1/2 
v+q ee p- i 
Z = -XAR `X 
141 \/ 7 (: + yin bir) o) 


Hi)T 
x (Xe as Ri)” RUX«) , 


is given by the lengthy expression 
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where 
Z (m) - 
Em- (m~1) S Eni = Rim- 1) Re. 1 )§(m—1) 
and 


2 _ JT: —1 
Om = Em. (m-1)Rmm.(m-1)Êm-(m-1) : 
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5.2 Miller’s Noncentral Multivariate t Distribution 


Let Y have the p-variate normal distribution with mean vector yz and 
correlation matrix R > 0. Let S be distributed independently of Y 
according to a v-variate normal distribution with mean vector À and 
correlation matrix mI,. Miller (1968) considered the joint distribution 


of 
y Y Y, 
XP = (Xi, Xa Xp) = ( > 2 e), 


which he referred to as the generalized p-dimensional t random vector. 
Assuming | S |? has the chi-squared distribution, Miller showed that the 
joint pdf of X is given by 


—(v+p)/2 


91-(v+p)/2m 7T v+ 1 
fix) = | 


YE —+ <7R'x) 
T(v/2)r?/2 |R] 


m 
2 
Te ty m (xT R! y) 
xep] pe ihe ree 
JVmx? Ro u 
necp aT : (5:2) 


where D_(,+p)(-) is the parabolic cylinder function (see, for example, 
Erdélyi et al., 1953). If æ = 0, then (5.2) reduces to 


= m ”PT(( + p)/2) 1 Tp-l 
58) =) rae (= +xTRox 


AP v+p vy, IAP 
on {AE 2 ° 2’ 2m(mxTR-'x +1) |’ 


where ;F is the confluent hypergeometric function (see, for example, 
Erdélyi et al., 1953). If both u = 0 and A = 0, then (5.2) reduces 
to the usual central multivariate t distribution (1.1) with degrees of 
freedom v and correlation matrix R. To the best of our knowledge, this 
interesting distribution given by (5.2) has not been pursued further since 
its introduction some 35 years ago. 


ee 


5.3 Stepwise Multivariate t Distribution 


Let Y be a p-variate normal random vector with mean vector fz, common 
variance g?, and correlation matrix R > 0. Let vS?/o? be a chi-squared 
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random variable with degrees of freedom v, distributed independently 
of Y. Then the joint distribution of 


and 


xy, = v +k- 1 Yr — thy Re 1 ¥ (4-1) 
E = Yva- S 


1 = 
Nom YZ- Re Y(k—-1): 


k= 2,...,p, (5.3) 


where 7, denotes the multiple correlation coefficient between Yp and 
(Yi, tes ,Yk-1), 


Yi = (Yi, Yo,..-, Yk), 
1 r Tkk 
T31 1 T2k 
Ra = 
Tki Tk 1 
and 
T = 
rk = (Tibet, T2,k+1> +++ Tk,k+1) 


is known as the stepwise multivariate t distribution. This distribu- 
tion has applications in linear multiple regression analysis; for instance, 
suppose that a random sample Y,,...,¥,;, corresponding to some non- 
random values (21;, Z2:), i = 1,...,n, is available. The null hypothesis 
to be tested is that the slopes of the two simple regression lines, Y on 
zı and Y on 22, are both zero. Then, the X, and X> above could corre- 
spond to the usual ¢ statistics for testing the two regression coefficients 
(Steffens, 1974). 

Steffens (1969a) studied the distribution of (5.3) for the special case 
R = Iņ, the p x p identity matrix. In this case, since 7, = 0, r(,-1) = 0 
and Ryx—1) = Tn-1), (5-3) reduces to 


2a 
= yF y1 se YR Yo-y 
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vv+k-1¥, 


[vS +X? +-+ XP, 


If 2 = 0, Steffens (1969a) showed that X1, X2,..., Xp are independent 
Student’s t random variables with degrees of freedom v,v+1,...,v-+p—1, 
respectively. This result also holds for general R (Siotani, 1976, Corol- 
lary 3.1). In the noncentral case u # 0, the X;’s are still independent, 
but X, has the noncentral distribution with degrees of freedom v and 
noncentrality parameter y/o while the X;’s (j = 2,3,...,p) have the 
doubly noncentral ¢ distributions with degrees of freedom v + j — 1 and 
noncentrality parameter y;/o in the numerator and (p/o)? + (2/0)? + 

+ (uj-1/0)? in the denominator. Steffens derived the joint pdf of the 
X;’s in the bivariate case as the double infinite series 


= exp { (57 + 632) ) 7/2} S (V261)' m 
f (21,22) = SC ae py1/(2l) +1) 09 


PE E. of : (v+k+1)/2 
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where 6; = u;/o, j = 1,2. For general p and R, Siotani (1976) showed 
that if ~ = 0, then X,, X2,..., Xp are still independent Student’s t ran- 
dom variables with degrees of freedom v,v+1,...,v+p—1, respectively. 
In the noncentral case, the joint pdf of the X; in (5.3) generalizes to 


_ E 
f (E12) = a (- 5 et) ee 


p a (v+k)/2 
«TI {1+ ea} 


i=l 
n T ((v + ky +--+ +kp)/2) 
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P ki 


(x17) 


x pean | a, 
t=1 {1 + z? /(v + j= ih Gases 


where the 7;’s are the noncentrality parameters given by 


GT RG $1) 
Tt = — m, 


En) = (61, €2,---.&), 


and ĉj = uj /0. 


5.4 Siotani’s Noncentral Multivariate t Distribution 


In Section 4.5, we discussed a bivariate generalization of the doubly 
noncentral univariate ¢ distribution given by (4.9). Siotani (1976) pro- 
vided a multivariate generalization of (4.9) by observing that the pdf of 
S* = S/Vv (where S is a noncentra] chi-squared random variable with 
degrees of freedom v and noncentrality parameter à) can be expressed 
as the Poisson mixture 


f(s*) = >> PAoa ("), 
k=0 
where 


exp(—A)A* 
k! 


is the kth probability of the Poisson distribution with parameter À and 


P(A) = 


è 2y” +2k)/2 *\v+2k—1 l a2 
faat) = DA (y+ e E ap (=z ) 


He defined X = Y/S* to have the doubly noncentral multivariate t 
distribution, where Y is a multivariate normal random vector with mean 
vector #2, unit variances, and correlation matrix R. The joint pdf of X 
is easily obtained as a Poisson mixture of the noncentral pdf (5.1) with 
v + 2k in place of v in the arguments of gamma functions and in the 
power of 1+x?7R7!x/v, that is, 


f(x) = SURO) fy): (5.4) 


k=0 


0 
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where 
: p Teine l o VERR) a 
buo = exp {- z£ rRe) (vr)PT ((v + 2k)/2) [R 
—(v+2k+p)/2 
fisi R Rx] 
ase (v+2k+p+0/2) f værre | 
AD (y+ 2k + p)/2) | Vu +x Rx] 


l=0 


5.5 Arellano-Valle and Bolfarine’s Generalized t Distribution 


Arellano-Valle and Bolfarine (1995) considered what is being referred to 
as a generalized ¢ distribution within the class of elliptical distributions. 
The distribution is defined by 


X = ptV}/?y, (5.5) 


where V has the inverse gamma distribution given by the pdf 


fv) = oer ee exp (->) , v>0 


and Y is distributed independently of V according to a p-dimensional 
normal distribution with mean vector 0 and covariance matrix R. We 
shall write X ~ tp(u, R; à, v). When A = v, this distribution reduces to 
the usual multivariate ¢ distribution (1.1) with mean vector p, correla- 
tion matrix R, and degrees of freedom v. For R > 0, the joint pdf of 
X ~ tp(p, R; A, v) is 


T (v +p)/2) 1 Tp- poten 
f(x) = — 55 J 4+ (xu) R (x-p 
9 = TTR Lt ! 
(5.6) 
d 
It is easy to observe from (5.5) that 
E(X) = gp, y>1 (5.7) 
and 
Vor(X) = SR >2 (5.8) 
ar = 773R YS, : 


Furthermore, for an m x 1 vector 7 and an m x p matrix B, 


Z = n+BX 
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= (n+By)+V'”BY 
~ tm (n+ Bu, BRB’; },v) (5.9) 


since BY has the m-dimensional normal distribution with mean vector 
0 and correlation matrix BRB’. Now let 


a Xi 
x = E i (5.10) 

Hı 
2 5.11 
d ( H2 ) ( ) 

and 
Rii Sa 

R = l 5.12 
e Ry» rae 


where X; ism x 1, Ri is m x m and so on. Taking B = [Im, 0] in (5.9), 
note that Xı ~ tm(u1, R11; à, v). By symmetry, Xə ~ tp-m(Hz, R22; 
à, v). Assuming R > 0, let 


Hy (x2) = m+ RoR (x2 — Hy), 
Rie = Rn- RoR Ra, 

and 

q(x2) = (x2- m)” Roy (x2 — M2). 
Using the fact that 

IR] = [R112] |R22| 
and 
(xp) RO (x-y) = (1 — Hy)” Rira (x2 ~ p (2) + q (x2), 

note that the conditional pdf of X, given X2 = x2 is given by 


|x) = HEDDA + ala), 
f(xi|x2) = n™/2T ((v + p — m)/2) (Ri? A +q (x2) 


—(v+p)/2 
+ (x1 = M (x2))7 Rii (X1 — py | 


This means that 


KX, |X2=x2 ~ tm (M, (x2), R112; À +q (x2), v +p- m). 
(5.13) 
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Note that when A = v, Xj | X2 = x2 ~ tm(H (x2), R112; V +q (X2), V+ 
p-m). Since q(x2) # p — m, this shows that the usual ¢ distribution 
does not retain its conditional distributions (see Section 1.11). Finally, 
it follows from (5.13) and (5.7)-(5.8) that 


E(X,|X2) = m +RuRy (X2 - p) 
and 


A + (X2 ~ ho)” Ry (X2 — Hy) 
v+p-m-2 
x (Ru = RiR Rai) . 


Cov (X: | X2) 


Arellano-Valle and Bolfarine (1995) also presented characterizations 
of the generalized t distribution (5.5) in terms of marginal distributions, 
conditional distributions, quadratic forms, and within the class of com- 
pound normal distributions. Briefly, these characterizations are 


e Let X have the p-variate elliptically symmetric distribution with mean 
vector 4 and covariance matrix R (for a definition of an elliptically 
symmetric distribution see, for example, Fang et al., 1990). Then, any 
marginal distribution is a generalized ¢ distribution if and only if X 
has a generalized ¢ distribution. 
Let X = (X7, XZ)? have the p-variate elliptically symmetric distri- 
bution with mean vector ft and covariance matrix R, where X; is 
m x 1. Then, the conditional distribution of X; given X3 is the gen- 
eralized m-variate ¢ distribution if and only if the distribution of X is 
the generalized p-variate t distribution. The proof of this result, which 
assumes the existence of a density, is similar to the proof considered 
in the pioneering paper by Kelker (1970) for the characterization of 
the multivariate normal distribution. 

e Let X have the p-variate elliptically symmetric distribution with mean 
vector 0 and covariance matrix I,, and let A be asymmetric p x p ma- 
trix. Then, XTAX ~ (mX/v) Fm» if and only if X ~ tp(0, Ip; à, v), 
A? = A, and rank(A) = m. This result is proved by utilizing An- 
derson and Fang’s (1987) assertion on the spherical distributions that 
put zero mass in the origin. 


The fourth characterization within the class of compound normal distri- 
butions is a consequence of a well known theorem due to Diaconis and 
Yivisaker (1979), which asserts that, in the regular exponential family 
with the natural parameterization, if the posterior expectation is lin- 
ear, then the prior distribution must be conjugated. It states that if 


5.6 Fang et al.’s Asymmetric Multivariate t Distribution 97 


X,,Xo,... is an infinite sequence of orthogonally invariant random vari- 
ables (which means that for each p, X = (X1,..., Xp)? and TX are 
identically distributed, for all p x p orthogonal matrices T) such that 
X, = 0 with probability zero and 


Var(X2|X1) = b+aX?, O<a<i1, b>0, (5.14) 


then X is distributed as ¢,(0,1,;b/a, (a+1)/a). The converse also holds. 
Arellano-Valle et al. (1994) pointed out that (5.14) could be extended 
to yield a location mixture of generalized ¢ distributions as follows. Let 
X,,X2,... be an infinite sequence random variables such that for each 
p, X = (Xi,...,Xp)? and PX are identically distributed, for all p x p 
orthogonal matrices T satisfying T1, = 1, (where 1, is a p-dimensional 
vector of 1’s). Under this assumption there exists random variables M 
and V such that, conditional on M and V, X1, X2,... are independent 
and normally distributed with mean M and variance V. Actually, M 
and V can be interpreted as the limits 


n 
Zas 0X > M 
i=l 


and 


as n — œ, where the convergence is with probability 1. Furthermore, if 
Var { (X - M? | X, M} = a(Xı- M}? +5, 

0<a<1, b>0, (5.15) 

then X is a location mixture of tp(M 1p, Ip; b/a, (a + 1)/a) and, in addi- 

tion, M and V are independent. Because of the form of the conditions 

(5.14)-(5.15), these two results are known as the predictivistic char- 


acterizations of the generalized t distribution. These results could be 
extended further to the matrix-variate t distributions (see Section 5.11). 


5.6 Fang et al.’s Asymmetric Multivariate t Distribution 


Fang et al. (2002) introduced an asymmetric p-variate t distribution with 
degrees of freedom (m,m,..., Mp). Its joint pdf is given by 


Pp 


fo: SOG Calan Ty Geen) aa) 


i=l 
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(5.16) 
where 
apy _ Elm +p)/2)T°-! (m/2) yR y E 
(Yn Yp; R) = T? ((m + 1)/2) |R]? (1+ m ) 


o 


p 

Yi 
x te iat 
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Here, R. denotes the correlation matrix, and tm and Tm, respectively, 
denote the pdf and the cdf of the Student’s ¢ distribution with degrees 
of freedom m. Note that the marginals of (5.16) have different degrees 
of freedom. In the particular case m; = m, i = 1,...,p, (5.16) reduces 
to the usual p-variate ¢ distribution with degrees of freedom m. 


5.7 Gupta’s Skewed Multivariate t Distribution 


In the next four sections (starting with this section), we shall discuss 
skewed multivariate ¢ distributions — a topic that has received special 
attention in the last few years, following the introduction of the skewed 
multivariate normal distribution in the classical paper by Azzalini and 
Dalla Valle (1996). A careful reader will observe that the possibilities of 
constructing skewed multivariate ¢ distributions are practically limitless. 

A p-variate random vector Y = (Y1, ¥2,...,¥p)? is said to have the 
skewed normal distribution if its joint pdf is given by 


fy(y) = 2dp(y;Z)@(aty), yew, (5.17) 


where X > 0 (with R denoting the corresponding correlation matrix), 
a € RP, bp(y; X) is the p-dimensional normal density with zero means 
and covariance matrix X, and ®(-) is the cdf of the standard normal 
distribution. Let W be a chi-squared random variable with degrees of 
freedom vy, distributed independently of Y. Gupta (2000) defined the 
joint distribution of 


si j= 1,2,...,p 5.18 
We j (5.18) 


as the skewed multivariate t distribution with degrees of freedom v. The 
joint pdf of (5.18) is given by 


T. 
fsa) = 2f.(x)Fosp (ve). (5.19) 
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(a) (b) 
(c) (d) 


-4 


Fig. 5.1. Fang et al.’s asymmetric t pdf (5.16) in the bivariate case (a) m = 2, 
mı = 10, mz = 10, and rig = 0; (b) m = 2, mı = 10, m2 = 2, and r12 = 0; (c) 
m = 2, mı = 10, m2 = 10, and ri2 = 0.5; and (d) m = 2, mı = 10, m2 = 10, 
and rı2 = 0.9 


where x € RP. Here, fy and Fẹ, respectively, denote the joint pdf of the 
central p-variate t distribution with correlation matrix R and degrees 
of freedom k and the cdf of the Student’s ¢ distribution with degrees of 
freedom k. From the definition (5.18) and the joint pdf (5.19), Gupta 
noted the following properties 


e If œa = 0, then (5.19) reduces to the central p-variate £ distribution 
with correlation matrix R and degrees of freedom v. 


e The skewed multivariate ¢ distribution approaches the skewed multi- 
variate normal distribution as v > oo, that is, 


Jim fix (x; a) = 2p (x; E)? (ax). 
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e Since Y? is a chi-squared random variable with degree of freedom 1, 


2 


2 Y 
tS e 


(1, v), 


the F distribution with degrees of freedom 1 and v; furthermore, the 
joint distribution of (X?, X3,...,X?) is multivariate F with parame- 
ters 1,1,...,1,v +p. 

e Since y7D"y is a chi-squared random variable with degrees of free- 
dom p, the quadratic form 


y= 'y 
W/v 


xD ix = 


~ pF pv; 


note that — as in the case for multivariate normal — the distribution 
of this quadratic form does not depend on a. 


The special case of (5.19) for & = I, is called the standard skewed 
multivariate ¢ distribution. If, in addition, v = 1, then it is defined as 
the skewed multivariate Cauchy distribution with the joint pdf 


fx(xja) = Qn PDP e) (5a) 


0 
T 
x Fp} E) , x ER. 


It should be noted that the above does not belong to the class of ellip- 
tically symmetric distributions, whereas the multivariate Cauchy does. 

Using results in Azzalini and Dalla Valle (1996) and Gupta and Kollo 
(2000), the mean vector and the covariance matrix associated with (5.19) 
are calculated as 


—(p+1)/2 


2v La 
HRS Er 


y > 2, 


Cov (X) v | 2(v + 4jaaT E 


Taea Pav —2)(1+ aFZa)|’ 
v >A, 


5.7 Gupta’s Skewed Multivariate t Distribution 101 


respectively. Furthermore, using the definition (5.18), the product mo- 
ments are easily obtained as 


P 
Phe tajat = E Ú t 
j=1 


vB (W ve(H “| 
vv ( wh) =r) Tj 
2 T(v/2) e (fr j 


for r < v/2, where r = Ti +T2 +++: +Tp. If Yi, Y2,...,Yp are mutually 
independent, then the right-hand side can be easily calculated. 

Branco and Dey (2001) noted that the joint pdf (5.19) is a particular 
case of a general class of skewed multivariate elliptical distributions. 
Actually, the joint pdf of the general class takes the form 


f) = fuse (x) Free (A7 Œœ- y), (5.20) 


where v* =v + p, 


T” = r+(x- u)" R(x- p), 
d= al R-} 
~ Vi-aTR-1a’ 
r(Y +p)/2) won RO] 
Bao Ee A E RRT 
fer) TPPA RF bs i | : 
and 
+\y"/ * x 7 
Fp (2) = (rt) PT + 1)/2) (rt +42)” HY? 


varT (v*/2) -00 


Note that f,,-(x) is the generalized ¢ pdf described in equation (5.6), 
and that F,-,,-+(x) is the cdf of a generalized version of the Student’s t 
distribution. The mean and the variance of the univariate marginals of 


(5.20) are 
_ af ((y—1)/2) fv 
i T (v/2) E 
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(provided v > 1) and 
yay [((v—1)/2)]? 
Var OO =. = Sa 
ne) a r T (v/2) | 


(provided v > 2), respectively. 


5.8 Sahu et al.’s Skewed Multivariate t Distribution 


Using transformation and conditioning, Sahu et al. (2000) obtained a 
skewed multivariate ¢ distribution given by the joint pdf 


y+p 
f(x) = tm, (x; p, R + D?) Tmvim |4/—~ 
( ) Trt v+q(y) 


x (1 -~D(R+D?)* pb)” D (R +D?) | 
(5.21) 


where y = x ~ p, q(y) = yT (R + D’)'y, and D is a diagonal matrix 
with the skewness parameters 41, ..., dm. In (5.21), tm (s,Q) denotes 
the usual m-variate ¢ density with mean vector p, correlation matrix Q, 
and degrees of freedom v. Furthermore, Tm,y4m(-) denotes the joint cdf 
of tm (0,1). The mean and the variance of this skewed ¢ distribution 
are given by 


vT ((v — 1)/2) 


E(X) = p+ 1 TOR 
and 
v v (T((v—1)/2)\" 
om = men (ey 


(provided v > 2), respectively, where ô = (51,...,5m)7. The multivari- 
ate skewness measure /1,m (Mardia, 1970b) can be calculated in analytic 
form. The expression does not simplify and involves nonlinear interac- 
tions between the degrees of freedom (v) and the skewness parameter 
ô when D = ôI. However, i,m approaches +1 as ô — too. Sahu et 
al. (2000) discussed an application of this model in Bayesian regression 
models. 
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5.9 Azzalini and Capitanio’s Skewed Multivariate t 
Distribution 


A slight extension of the skewed normal distribution given in (5.17) is 
fx Y) = p(y- E; E) (aW (y-&)), (5.22) 
where y € RP, € € RP, W = diag(,/oi,..-,,/Opp), and the rest is as 
defined in (5.17). In the particular case € = 0, (5.22) reduces to (5.17). 
Starting with a random vector Y having the pdf (5.22) with € = 0, 


Azzalini and Capitanio (2002) defined a skewed t variate as the scale 
mixture 


X = €4+Y/VV, (5.23) 


where vV is distributed independently of Y according to a chi-squared 
distribution with degrees of freedom v. Simple calculations using a pre- 
liminary result on Gamma variates show that the joint pdf of X is 


KO = 24s (0) (arw (x8) tE), 


where Q = (x — €)’R71(x — €) and fk, Fẹ are as defined in (5.19). 
Note that this pdf coincides with that of Branco and Dey (2001) given 
in (5.20). In the standard case € = 0 and © = R, the joint cdf of X can 
be represented as 


Fx(x) = 2Pr (—Uo/VV <0,U/WV < x) ; (5.24) 


where (Uo, UT)T has the (p + 1)-dimensional normal distribution with 
zero means and covariance matrix given by 


i 1 T 
mo (aR) 


Ra 
vVi+aTRa 
The representation (5.24) can also be written in terms of a (p + 1)- 


dimensional ¢ distribution. 
In the case = 0, simple expressions for the moments of X can be 


obtained. Defining 
de r(e- 
n T(v/2) : 


where 
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and provided that v > 1, one obtains 
E(Xk) = Wkkhk, 


Y 
E(X) = 5 Vki 


3v? 


E(Xk) = ug gie 


E(X) = Wau, 


E(XXT) = 


v{3- Tô 
Skewness (X4) = pk pE — 2 + wrn) 


and 


3p? Avy! p (3 ~ 575) 
v= 2)v=—4.— v—3 


2 
6v ye" u Ts\? a > 
UPET -3 (ô ô) y—2 KH 


v >4. 


Kurtosis (X) = i 


Properties concerning linear and quadratic forms of X can also be de- 
rived. For example, if a € R” and A is a m x p constant matrix of rank 
m, then the affine transformation a + AX will also follow the skewed t 
distribution given by (5.23) with the parameters €, =, and @ replaced 
by a+ A€, X', and a’, respectively (the degrees of freedom v remains 
unchanged), where 


D = ASAT, 
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W’ (=')' Ba 
oO = mm mmm 

+a? (2" -B (2') ° BT) a 


W' = VS’, B = W-'DAT, and Ð” is given by © = WE"W. Also, 
for appropriate choices of B (a symmetric p x p matrix), the quadratic 
form Q = (X-—£)"B(X — £) can be shown to have the f Fẹ» distribution 
for some degrees of freedom f. For details see Azzalini and Capitanio 
(1999) and and Capitanio et al. (2002). 

A further extension of (5.17) examined independently by Arnold and 
Beaver (2000) and Capitanio et al. (2002) is of the form 


f(y) = p(y- £; E) 8 (ao +a7W" (y - €)) / (7), (5.25) 


where y € RP, T E R, ao =T/V1-— 6’ R-'6, and the rest are as defined 
in (5.22). In the particular case 7 = 0, (5.25) reduces to (5.22). Taking 
Y in (5.23) to have the pdf (5.25) with € = 0, one obtains an extended 
skewed t distribution for X. The corresponding joint pdf for X is quite 
complicated, but the joint cdf can be represented as 


F(x) = Pr (- (Uo +17) /VV < 0,U/VV < x) / (7) 


(compare with (5.24)). Moreover, for the particular case € = 0, the first- 
and second-order moments are 


E(X) = E (1/vV) m(r)W6 


(provided v > 1) and 
E(XXT) = -= È + {n2(7) +n? (r)} Wô (ws?) '] 


(provided v > 2), respectively, where 
dE 


for k = 0,1,2,.... 


5.10 Jones’ Skewed Multivariate t Distribution 
The univariate Student’s ¢ distribution has the pdf 


Pv +1)/2) f P jeer | 


ATED 2 (5.26) 


Y 
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By replacing (5.26) with the skewed univariate ¢ pdf (4.38) in a multi- 
variate distribution, Jones (2002c) introduced a new skewed multivariate 
t distribution that we shall describe in this section. Let X be a p-variate 
random vector having the standard multivariate ¢ distribution with the 
joint pdf given by 


Mera f xtxy rr. 
Jur (v/2) v ` 
The univariate marginals of this are (5.26). Multiplying (5.27) by (4.38) 


and dividing by (5.26) yields Jones’ (2002c) skewed multivariate ¢ dis- 
tribution. The corresponding joint pdf is 


(5.27) 


21-a—T((v + p)/2)T (a +0) 
wreda F ((v + 1)/2 aio) 


+1/2 

z? (v+1)/2 qi a+1/ 
x414 a 
v Ja+c+2? 


EES Ty) —(tp)/2 
Ce ae eek eRe {i+ =} . (5.28) 
Jatet+a v 


This reduces to (5.27) for a = c = v/2. In the bivariate case, (5.28) 
is a distribution with (i) a skewed t marginal with parameters a and c 
in the gı direction; (ii) conditional distributions of Xə |X, that match 
those of the bivariate ¢ distribution being t distributions on v + 1 de- 
grees of freedom scaled by a factor of y (v +2?)/(v +1); and (iii) a 
diagonal correlation matrix. Another new multivariate distribution can 
be obtained by replacing (5.26) by the pdf of the Gumbel distribution: 
exp(—2a — exp(—2)). This results in the joint pdf 


P((v +p)/2) 
ET PM} 
ie ~(v+p)/2 
«(14 3) {1+} (6.29) 


With respect to the correlation structure, this pdf has much in common 
with (5.28). But the conditional distribution of X, given X2,...,Xp 
and the marginals are different. 

Jones (2001a) noted that, if Y has the beta distribution with param- 
eters a and c, then X = Vat+cY/V1 — Y? has the skewed univariate t 
distribution given in (4.38). Jones (2002c) observed a similar relation- 
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(a) (b) 


x2 


Fig. 5.2. Jones’ skewed multivariate t pdf (5.28) for p = 2 and (a) a = 6, 
v = 3, and c = 2; and (b) a = 2, v = 3, andc=6 


ship between the joint beta pdf 


21—0=cP (a + c)I'(b) a-t ai T 
Tar tyre tt a a a E 


a>0, b>1/2, c>0 


and the skewed multivariate t distribution given in (5.28) when p = 2 
and b = v/2 + 1; namely, if (Y; , Y2) have the former distribution, then 


(XXa) = ViiVJate VYevv +Yř(a+c-v) 
Hen T -Y y1- Y-Y; 
has the distribution (5.28) for p = 2. 


In the univariate case, F' and skewed t (equation (4.38)) distributions 
are linked in two ways that produce identical results: (i) A random 
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(a) (b) 


x2 


Fig. 5.3. Jones’ skewed multivariate t pdf (5.29) for p = 2 and (a) v = 1; and 
(b) v = 20 


variable with any one distribution can be obtained by transforming a 
random variable from the other; (ii) a random variable with each dis- 
tribution can be written as a function of two independent chi-squared 
random variables. If W; ~ xb; , Fi ~ Foy, vo» and T; is a random variable 
with the pdf (4.38), then 


T; (JAVR VF; - 7 2z) (5.30) 
F; = aa (n+ yur), 


A e) 
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and 


where w; = v + vi. By extending this relationship between the univari- 
ate F and the skewed univariate t, Jones (2001b) introduced another 
skewed multivariate ¢ distribution. It is known (see, for example, John- 
son and Kotz, 1972, Chapter 40, and Hutchinson and Lai, 1990, Section 
6.3) that the joint pdf of the random variables F;, i =1,...,p is 


f(fis---sfp) = D (vo) T (vp) A+ aus). 
fı>0,...,fp>0, (5.31) 


where n = v9 +--- + vp. Applying the transformation (5.30) to (5.31), 
Jones (2001b) obtained the joint pdf of T; as 


_ ty fe (e+ vinta)” 
Perot) = Puro) atv re 


(ayara) | 
P (ty +4/wj,+t? 
y. S e 
j=1 na 
ti ER,...,tp ER. 5.32 
p 


The univariate marginals of this pdf take the form of (4.38). The con- 
ditional pdf of T; given any subset T;,,..., Tip, of the other variables, 
p2 < p, is proportional to 


2v; 
(t: + fwit t) 


Vati {1+ K- (4+ Vora) } 


wi tY te tVing ? 


where 


p2 1 2 
K = w 14A (tye) i 
f=1 


The regression of T; given T;,,... Tipa takes the nonlinear form 


y= Re ee SM te Ue voR 


E (T; \ Tas- T, NOA N 


tpz 
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yowi (pb — 1/2) 
VbK 


where = vo + vi +e + Vip,» Note that the corresponding relation 
for the multivariate F' distribution in (5.31) is linear. If T),...,Tim 
denote any m of the p T;’s (with their degrees of freedom correspondingly 


renumbered as v1,..., Vm along with vo), then the product moment of 


Phera Tm is 
m wal? Am [2 P 
À; "e Wm Ài 
hii = Fis i i 
(fe) = pete Sie (Bae) 


Eeit) 


xT a — Dii itn) , 


provided that v; > A;/2, i = 1,...,m and vo > (Ai +: +Am)/2. In 
particular, the variances and the covariances are given by 


aa alg a {erica 


E (re “1/2)Cr (vi — ua 
I (vo) T (v:) 
gpu 2 
(vo — 1) (vi — 1) 


(provided vo > 1 and v; > 1) and 


Cov (T;, T;) = een eee 


(vi — 1/2) (v; — 1/2) 
x (vo — 1/5 


(provided v; > 1/2, v; > 1/2 and vo > 1), respectively. In the particular 
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case vo = ++: = Vp = v/2, (5.32) reduces to 


P((p + 1)v/2) B f (tet FR) 


Font) = 3 a I aM 


Tet! (v/2) rae Jv +t 
r (p+1)v/2 
J y+ >> (t+ yrë) , 
j=1 
t E R,... tp ER. (5.33) 


Jones (2001b) referred to this distribution as the symmetric multivariate 
t distribution. Note that all of the marginals of (5.33) have the Student’s 
t distribution with degrees of freedom v. The correlation between any 
two T;’s in (5.33) takes the simple form 


ae (RA) 


provided that v > 2. 
The limiting form of (5.32) as vo — oo and v; > 1 remains fixed, 
i=1,...,pcan be shown to be 


p 
(11 n g (41 + out, sey pp + Opty) ’ 
i=] 


where 
p 72+) 
= p i 
g (ti, sth) 2 I T (44) exp ( z) , 
= r (ri = 1/2) 
Hi T (vi) , 
and 


1 12(4%—1/2) 
n-—1 r? (n) ` 


Note that z; and g; are the mean and the standard deviation of ,/2/x3,, 
distribution. When vp remains fixed but v1,..., Vp — oo, the marginals 
of (5.32) tend to ,/2/x3,,, distribution, but the correlations between the 
T;,’s tend to 1 and the joint distribution becomes degenerate. When all 


Vo, V1, ...,Vp — 00, all of the marginals tend to the normal distribution 
— but the form of the limiting joint distribution will depend on the 
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(a) (b) 


Fig. 5.4. Jones’ skewed multivariate t pdf (5.32) for p = 2 and (a) vo = 2, 
vı = 4, and v2 = 4; (b) vo = 2, m = 20, and v2 = 1; (c) vo = 2, vı = 1, and 
v = 20; and (d) v =m =n =2 


specific relationships between the v’s. The limit of (5.33) as v —> oo 
is the multivariate normal distribution with zero means, unit variances, 
and an intraclass correlation structure with correlation 1/2. 


5.11 Matrix-Variate t Distribution 


The matrix-variate ¢ distribution, motivated by applications in Bayesian 
inference, is the product of James Dickey’s research in the mid-1960s. We 
need the following terminology to discuss its mathematical properties. 
Let yz be a p x q constant matrix, let R > 0 be a p x p matrix, and let 
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Q > 0 be ag xq matrix. For m > p +q -— 1 define 


k(m, p, q) = TT RD, (5.34) 


where 


1 r 1 
— "0-1/4 -1\...r[z-2412 
r,(z) T (2) (« 3) r (« 5 + z) 


is the generalized gamma function. Furthermore, for real or complex 
constants a1,...,@p and b,,...,6, and for random matrices S and T, 
define the general hypergeometric functions (see Constantine, 1963) 


pfa (@i,.. . Qp; b1,- -ba S) 
(a ee = 
“>>> Gyo ae — oP) 


k=0 K 


and 
pk, (a1, ..., Ops bi,- -- ba; S, T) 


7 D k! , (5.36) 


where K = {k1,...,km}, ki > k2 > +++ > km > 0, ki thot---thkm = k, 
Tm (z, &) 


Pla) 


(z)x 


T, (z,k) = rnat) (2 the- 3) r (ath -2), 


and C, (S) and C,,(T) are symmetric homogeneous polynomials of degree 
k in the latent roots of S and T, respectively. 

A p xq random matrix X is said to have the matrix-variate t distri- 
bution with parameters u, R, Q, and m if its joint pdf is 


1 (m—p)/2 |p )-4/2 
xX = P R q 
IX = yzg QR 
—m/2 


x |Q +(X - pw)" R (X— p) (5.37) 
(Dickey, 1966a, 1967b). If p = 0, then we say that X has the central 
matrix-variate t distribution with parameters R, Q, and m. Otherwise, 


we refer to the distribution as a noncentral matrix-variate t. The usual 
multivariate ¢ distribution (1.1) is the special case of (5.37) for p = 1 
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(single row) or q = 1 (single column). It is also known that the partic- 
ular case of (5.37) for p = 0 and R = I, is a mixture of the normal 
density with zero means and covariance matrix I, ® V — in the q x q 
positive definite scale matrix V. Densities of the form (5.37) appear in 
the frequentist approach to normal regression as the distribution of the 
Studentized error, both the error in the least squares estimate of the 
coefficients matrix and the error in the corresponding predictor of a fu- 
ture data array (Cornish, 1954; Kshirsagar, 1961; Kiefer and Schwartz, 
1965). In the Bayesian conjugate-prior and diffuse-prior analyses for the 
same sampling models, it arises as the marginal prior or posterior dis- 
tribution of the unknown coefficients matrix, and also as the predictive 
distribution of a future data array (Geisser and Cornfield, 1963; Ando 
and Kaufman, 1965; Geisser, 1965; Dickey, 1967b, Section 4; Zellner, 
1971, Chapter 8; Press, 1972, Section 8.6). More recently, Van Dijk 
(1985, 1986) discussed applications of (5.37) in the linear simultane- 
ous equation (SEM) model, which is one of the best-known models in 
econometrics. The SEM model is used in several areas, for instance, in 
microeconomic modeling for the description of the operation of a mar- 
ket for a particular economic commodity and in macroeconomic model- 
ing for the description of the interrelations between a large number of 
macroeconomic variables. 

If X has the central matrix-variate distribution with parameters R, 
Q and m, then it can be represented in numerous ways, as described by 
Dickey (1967b) and Dawid (1981). The following results (due to Dickey, 
1967b, and Dickey et al., 1986) concern the conditional and the marginal 
distributions of X 


e If X = (X,,X2)", then the conditional distribution of X,, given 
Xo, is the matrix-variate t with parameters —R1,;Rjy X2, RI}, Q + 
XTR Xo, and m. 

e If X = (X,, Xe), then the conditional distribution of X1, given Xo, is 
a matrix-variate t with parameters X2Q3) Q21, (R + X2Q3p X7)7}, 
Qu - QQZ Qz, and m. 

e If X = (Xi, X2)T, where X; is p; xq, then the marginal distribution of 
X, is a central matrix-variate t with parameters Ro Q and m — pı. 
In the particular case XT = (x1,...,Xp), each row x7 has the central 
multivariate t distribution with degrees of freedom m — p — q + 1 and 
correlation matrix r;¿Q/(m—p-—q+ 1). A consequence of this is that 
the density (5.37) of X can be written as the product of conditional 
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multivariate ¢ distributions of the rows of X, that is, 


F(X) = fa) f(x | x1) f (xp | x1- -3 Xp-1). 


e If X = (X1, X2), where X; is p X qj, then the marginal distribution of 
X, is a central matrix-variate t with parameters R71, Q22, and m—q1. 
In the particular case X = (x1,...,X,), each column x; has the central 
multivariate t distribution with degrees of freedom m — p — q+ 1 and 
correlation matrix q;;R./(m—p—q+1). A consequence of this is that 
the density (5.37) of X can be written as the product of conditional 
multivariate ¢ distributions of the columns of X, that is, 


Í (X) = f (x1) f (x2 | x1) +++ f (Xg | X1,- --;Xq-1)- 
e If X is doubly partitioned as 
Xi1 Xir ) 
X = 
( Xa Xz J’ 

where Xj; is p; X qj with pı + p2 = p and q + q2 = q, then the condi- 
tional distribution of x7, given X1; and X3; is a matrix-variate ¢ with 
parameters R7,(R7,)-!X7,, Qu + XIRI Xu, Ree — Rei RI} Riz, 
and m+qı —p—q+1. (Here, the partitions of R and Q correspond to 


the partition of X.). Since this depends only on X41, it follows that 
X12 and X2; given Xj); are conditionally independent. 


The following results (due to Javier and Gupta, 1985, and Dickey 
et al., 1986) concern the distributions of the quadratic forms XAXT 
and AXB when X has the central matrix-variate ¢ distribution with 
parameters R, Q, and m. 


e If A> 0 isq xq, then the pdf of W = XAX? is given by 
1 = m= = erent Fe 
f(W) = konna a p/2 [R|‘ 0/2 IQ] p/2 [w]e p—1)/2 


x |R + W7”? 
x Fo (F (R+W) W, h - (QA)™), 


where W > 0, k(m, p,q) is given by (5.34), and ıFo is as defined in 
(5.35). An immediate consequence of this result is that 


/ Jw re -D/2 IR + wl? 
W>o 


xF (5; (R+ W)! W, L - (Qa)*) dw 
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2 T aori- q)/2) |A|?/? [R 7079/2 QP? j 
Hence the hth moment of | W | is 
I ((g + 2h)/2)T ((m — q — 2h)/2) 
I (m/2) 
x [APA IRI QP. (5.38) 


pwe 


Further using the fact that an F-distribution is uniquely determined 
by its moments, it follows that | W | can be written as a product of 
q independent univariate F’s, that is, 


|W] ~ I[F@-G-9,m-a-G-1). 


For the special case A = I, and p = q, (5.38) gives the hth moment 
of XXT. 

e If A > 0 is px p and B > 0 is q xq, then AXB has the central 
matrix-variate t distribution with parameters B'QB, AR7!A‘, and 
m,m>pt+q-1. 

e If A > O is px p and B is aq xr rectangular matrix, then AXB 

has the central matrix-variate t distribution with parameters BTQB, 

AR-'AT, and m,m>p+r-1. 

If a is a q x 1 vector, then a? X7 has the central multivariate t distri- 

bution with degrees of freedom m — p — q + 1 and correlation matrix 

a’ QaR/(m—p-—q+1). 

e If ais aq x 1 vector such that aTa = 1 and b is ap x 1 vector, then 
a? Xb is a linear combination of Student’s t random variables. 

e If b is ap x 1 vector, then XTb has the central multivariate t distri- 

bution with degrees of freedom m — p— q + 1 and correlation matrix 

b7RbQ/(m—p-—q+1). 

If a is ag X 1 vector and b is a p x 1 vector, then 

(m —p—q+t1)a?XTb 
(a7 Qa) (b7Rb) 
has the Student’s ¢ distribution with degrees of freedom m— p—q+1. 
e In the special case R. = I, and Q = 1, if a is a real number and b 


is a q X 1 vector such that a?bTb = 1, then aXb has the Student’s t 
distribution with degrees of freedom m — q. 


Javier and Gupta (1985) also derived a useful factorization of the cen- 
tral matrix-variate t density in terms of the product of q— 1 independent 
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univariate F densities and q independent multivariate t densities — par- 
alleling the result of Tan (1969a) for matrix-variate beta distributions. 
Let X be a p x q random matrix having the density (5.37) with p = 0. 
Set 

U = (R7?xQ-¥?) (Rx), 


so that U is p x p, symmetric, and U > 0. Partition U as 
Ui Uie ) 
U = 
( Un Ux 
so that Uy, is 1 x 1 and Ug, is (p — 1) x (p — 1). Abbreviating Do. — 
Dza DI Diy by Də2.1, define the following submatrices 


F 1 p 
u% = (us, sae ; J= 1l, 2, ssa p= 1, 


u®, = Uz1, 


l a 
UY = (UR). F=L2 PH, 
and 
uV = Un, 


so that UY), is (p — j) x (p — j) and UY is 1 x 1. With all of this 
notation the factorization of the density of X (due to Javier and Gupta) 
can be stated as 


q—2 y . , 
f(X) = [r (1 nte) 
j=0 


p 
x II tu {14u} (zi a UŞ; m =g= 1)a) ; 
E iom 


where t,(T;r) is the joint pdf of a central multivariate ¢ distribution 
with degrees of freedom v and correlation matrix T and F(a, £) is the 
pdf of a univariate F distribution with degrees of freedom a and £. 

The two predictivistic characterizations of the multivariate ¢ distribu- 
tion based on (5.14) and (5.15) have the following matrix-variate gener- 
alizations 


e Let Xi, X2,... be an infinite sequence of g-dimensional random col- 
umn vectors that are orthogonally invariant (which means that, for 
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each k, X® = (X,...,X,)7 and PX) are identically distributed, 
for all k x k orthogonal matrices F) and, for k fixed, let x") = 
(X(i-aesiy--->Xin)?, i = 1,2,.... If X,...,Xq are linearly inde- 
pendent with probability 1 and 


T 
E xf? x 


T 
xt] = ax x48, 


where 0 < a < 1 and B is a q x q positive definite matrix, then 
the distribution of X‘) is the matrix-variate t with p = 0, R = Ip, 
Q = (1/a)B, and m = 1+ (p/a) —p. 

e Let X,,Xo2,... be an infinite sequence of g-dimensional random col- 
umn vectors such that, for each p, X®) and IX) are identically 
distributed, for all p x p orthogonal matrices I satisfying T'1, = 1p 
(where 1, is a p-dimensional vector of 1’s). Under this assumption 
there exists a o-algebra T of events such that 


2 1 
xX, = 22 
> E(Xı|T)=M 


and 


> E(XıXT|T)-E(X | N{E(XT|T)} =V 


as n —> oo (Chow and Teicher, 1978), where the convergence is almost 
everywhere. Moreover, if 


E (xe s 1M"): (xi - 14M") | x, m] 


=a (x? = 1M"). (xi z 1M”) +B, 


where 0 <a < 1 and B is a q x q symmetric positive definite matrix, 
then X) is a location mixture of the matrix-variate t distribution 
with u = 17M, R = Ip, Q = (1/a)B, and m = 1 + (p/a) — p. In 
addition, M and V are independent. 


Dawid (1981) provided a different but more convenient parameteri- 
zation of (5.37). If Y (p x p) has the standard matrix inverse Wishart 
distribution with parameter ô and if, given Y, X (n x p) has the ma- 
trix normal distribution with parameters I,, and Y, then X is termed 
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as having the standard matrix ¢ distribution. In the notation of (5.37), 
this would correspond to u = 0, R = Ip, Q = I}, andm=6d+n+4+p-1. 
Under Dawid’s parameterization, if X* is a n* x p* submatrix of X, then 
X* has the matrix t distribution with parameters In», Ip», and 6: Note 
that 6 is unchanged. This kind of consistency enabled Dawid (1981) 
to construct what is termed as the standard infinite matrix t distribu- 
tion. Namely, X = {2;;,i > 1,7 > 1} is said to have the above-named 
distribution if it has the property that for all (n,p) the leading n x p 
submatrix of X has the standard matrix ¢ distribution with parameter 
ô. The standard matrix ¢ distribution also has the attractive property 
of being spherical, that is, if P (n x n) and Q (p x p) are two orthogonal 
matrices, then both PX and XQ have the same distribution as X. 


5.12 Complex Multivariate t Distribution 


A complex normal random vector Y = V +/—1W is a complex random 
variable whose real and imaginary parts possess the bivariate normal 
distribution. A complex p-variate normal random vector 


Y V+vV-1W 
(Vi + V=IM, Vz + V=1W),...,Vy + VZIW,)” (5.39) 


is a p-tuple of complex normal random vectors such that the vector of 
real and imaginary parts (Vi,W,,...,V,, Wp) has the 2p-variate normal 
distribution (Goodman, 1963). Section 45.13 of Kotz et al. (2000) pro- 
vides an account of this distribution. It is usually assumed that the 
2p-variate normal distribution of (V1, W1, ..., Vp, Wp) has zero means 
and covariance matrix given by 


1/5 -52 

2 ( Sy 5; ) i 
where 1 is symmetric (matrix A is symmetric if AT = A) and D2 
is skew-symmetric (matrix A is skew-symmetric if A = —A?). From 
the given structure it is easily seen that the covariances of the p-variate 
vectors V and W are each equal to ©1/2 and the covariance between V 
and W is equal to 2/2. Hence the covariance of the complex p-variate 
normal random vector Y in (5.39) is ©; + V—1Z_ = Ð, say. The 


properties of the distribution of Y have been studied by many authors. 
The joint pdf of Y is given by 


fy= =a exp {-y7Z""y}, (5.40) 
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where y denotes the complex conjugate of y (Goodman, 1963). For ex- 
ample, for the complex univariate normal distribution, y = vı + V— 1w 
and the covariance matrix © = 07, and thus the joint pdf of Y becomes 


1 v? +w? 
f(y) = noa P Gann . 


The characteristic function of Y can be shown to be 
1 
E [exp {i (s7V + t7 W) }] = exp {-797Eu , 


where u = s + v—it (Wooding, 1956). Explicit expressions for the 
moments of Y have been derived by Sultan and Tracy (1996). The 
complex multivariate normal distribution has applications in describing 
the statistical variability of estimators for the spectral density matrix of 
a multiple stationary normal time series and in describing the statistical 
variability of estimators for functions of the elements of a spectral density 
matrix of a multiple stationary normal time series. 

Relatively few results are available that deal with complex multivari- 
ate t distributions. Originally, the complex multivariate ¢ distribution 
was introduced by Gupta (1964). Let Y have the complex p-variate nor- 
mal distribution with zero means, common variance g?, and covariance 
matrix o?R. Let 2vS$?/o? have the chi-squared distribution with de- 
grees of freedom 2v, distributed independently of Y. Then X = Y/S is 
said to have the complex p-variate ¢ distribution with degrees of freedom 
v and correlation matrix R. By writing down the joint distribution of 
S and X and then integrating out S, the pdf of X can be obtained as 


Tv +p) ERREN 

= ——— {1+3 R ; 
0) = prom e 

Tan (1973) discussed some properties of this distribution. Tan (1969b) 

provided a brief discussion of a complex analog of the matrix-variate t 


distribution given by (5.37). 


5.13 Steyn’s Nonnormal Distributions 


Strictly speaking, this section does not deal with multivariate t distri- 
butions per se. This section is about nonnormal distributions arising 
from the class of multivariate elliptical distributions that contains the 
multivariate t as a particular case. 

One weakness of the class of multivariate elliptical distributions is that 
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all fourth-order cumulants are expressed in terms of a single kurtosis pa- 
rameter (moreover, the univariate marginals have zero skewness and the 
same kurtosis). In fact, the cumulant generating function (cgf) and the 
moment generating function (mgf) of a p-variate elliptical distribution 
with zero means and correlation matrix .R are 


KiGgsiigty) 
ga (tTRt) + T 5 (erry) +% A (t7Rt)* (5.41) 
2 2° (2 
k>3 
and 
M (ti... tp) 
tTRt « (t? Rt)’ "ONN 
= exp (=) 1+ 3 i + 5- Bi (t Rt) j (5.42) 


k>3 


respectively, where Ag, Bk are constants and « is the kurtosis parameter. 
Steyn (1993) attempted to introduce meaningful multivariate distribu- 
tions that are related to the elliptical distributions and that contain 
more than one kurtosis parameter. 

As an example, consider a random vector (X1, X2, X3) possessing the 
three-dimensional normal distribution with the mgf 


M (t1, te, ts) 


1 
= exp 3 (e + t + A + 2riztit2 + 2rı3tıtz3 + 2rastata) | . 


(5.43) 


Suppose this model is placed in a changing environment that favors a 
change in one of the random variables, say X1, in such a way that the 
kurtosis should be taken into consideration. Specifically, assume that. 
the marginal distribution of X; is elliptical with the kurtosis parameter 
#1, while the conditional distribution of (X2, X3) given X, = zı remains 
unchanged. Note that (5.43) can be written as 


M (t1, t2, t3) 


1 1 
Tix exp E {(1- r?) t + (1-— r?) t3 + 2 (r23 — rieris) it}| 


x f exp {-3 + (tı + ryote + T13t3) T1 } dz,. (5.44) 


—oO 


Changing the probability element in the integrand in (5.44) to that of 
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the elliptical distribution in (5.42), one can show that the mgf changes 
to 


K 
Mi; (ti, t2,t3) = M (t, te, ts) exp {5 (tı + rigte + rigt)“ +e } ; 
The corresponding cgf becomes 
1 2 2 2 
Ky (ti, te, t3) = 3 (t +t + t3 + 2rjotite + 2rigtit3 + 2rogtat3) 
1 
+z“! (ti + rita +rigta) +o. (5.45) 


Setting t2 = tz = 0, the cgf of the marginal distribution of X, is given 
by 


1 1 
Kı (t1,0,0) = sti + gait too, 


which shows (as it should) that the marginal distribution of X is ellip- 
tical with kurtosis parameter xı (compare with equation (5.41)). How- 
ever, for tı = t3 = 0 and tı = t2 = 0, one obtains 


1 1 
Kı (0, t2,0) = zt? + g“! (riate)* +... 
and 
1 2 1 4 
Kı (0, 0, ts) = z3 t g“! (rı3t3) tees 


thus, the marginal distributions of X2 and X3 are also elliptical but 
with kurtosis parameters Kir{, and Kir{3, respectively. Furthermore, 
for t = 0, 


1 1 
K; (0, to, ts) = 3 (È + 2ragt2t3 + t2) + g“! (rizt2 + rigts) Trt 


which shows that the joint marginal distribution of (X2, X3) is not el- 
liptical. The fourth-order cumulants are easily obtained from (5.45) as 
Kijk = 3K riort,, where i +j +k = 4. 

Suppose now that the model given by (5.43) is placed in an environ- 
ment that favors a change in not only X, but also influences (X2, X3). 
Assume — in particular — that the conditional distributions of Xz given 
X, = x, and X; given (X1, X2) = (21,22) are elliptical with kurtosis 
parameters Ky and «3, respectively. Then calculations similar to those 
above show that the mgf (5.43) changes to 


1 w\? 1 usoz, i 
M2 (ti,te,tz3) = M (t1,te,t3) exp sm (4) +e ( ais | 
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2-2 2 
K3 (#7) th, (5.46) 


ui = ti +rTiztz +7i3ts, 


1 
2 


where 


T23 — 712713 


u = tot 
á 1-ri, 


t3, 


and 


2 

2 auj 2 (r23 = 12713) 
9312 = 77137 Ler? 
12 


It is easily seen that the marginal distributions of Xı, X2, and X3 
are elliptical with kurtosis parameters given by x1, rj2«1 + 041K2, and 
rizki +(03. —04..)K3, Where o2, = 1—r?,. This time, the fourth-order 
cumulants are given by 
j 2(j-2 k 
Kijk = 3 {ririri + 2034 (r23 — T2713) } ; 

where i +j +k = 4. In the case of K004, K303.;2 should be added. 

Similar constructions can be performed when X = (X1,..., Xp)” has 
a p-variate normal distribution with zero means, covariance matrix R, 
and the corresponding mgf 


1 
Mə (ti... tp) = exp (5¢7Rt) l (5.47) 
Consider two environments similar to those considered above for the 


trivariate normal model. First, divide X into two random vectors X® = 
(X1,...,Xa)? and X@) = (Xn4i,...,Xp)", and let 


R = ( Rit Riz ) 
Ri, Re 
be the corresponding partition of the correlation matrix. Also let t® = 
(tis... th)? and t) = (tn41,...,tp)? be the corresponding partition 


of t. Now assume that the marginal distribution of X“) is changed 
to an k-dimensional elliptical distribution with kurtosis parameter «sı 
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Fig. 5.5. Steyn’s bivariate pdf corresponding to (5.46) for t3 = 0 and (a) 
kı = 0.8, k2 = —0.4, and ri2 = 0.2; (b) kı = 0.8, k2 = —0.4, and rı2 = 0.8; 
(c) kı = —0.4, k2 = 0.8, and rı2 = 0.2; and (d) xı = —0.4, k2 = 0.8, and 
Ti? = 0.8 


and that the conditional distribution of X® given X(® = x) remains 
unchanged. Then calculations show that (5.47) changes to 


T 
M,(t) = M(t)exp Bi (t +R Rat) 
2 
xRıı (6 + RI Rist”) +e | . (5.48) 


Clearly, 


lia? 1 2 
Mı (6, 0) = exp tie Ry, t + g“ (tO Rut) Meat | 
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and 


1 
Mı (0,6) = exp [po Ra 


2 
+5 (tO RRF Rist” ) seve \ 


which shows that the marginal distribution of X‘ is an h-dimensional 
elliptical distribution (as it should) while that of X® is not elliptical. 
The second-order cumulants of (5.48) are the same as those for (5.47). 
For the second construction, partition Xz into KX) = (Xp,...,Xn+s)? 
and X® = (Xpseui,---, Xp)’, and let t() be partitioned correspond- 
ingly into t@) and t(®. Let C denote the conditional covariance matrix 
of X, given X; = xj, that is, 


C = Rz -RaR Riz, 
and let C be partitioned as 
Cu Ciz ) 
C = : 
( Ch Cz 
so that C,; is sxs, Cy. is s x (p—h-s), and C22 is (p—h—s) x (p—h-s). 
Now, assuming that the distribution of X, is elliptical with kurtosis 


parameter «, and that of X®) — E(X®) | X® = x) is elliptical with 
kurtosis parameter %2, one can show that the mgf (5.47) changes to 


1 T 
M2(t) = M(t) exp (da (© + Ri Riot) 

2 

xRy (t® +R Rist) } + | 

1 E T 
+ (zel (t + Cit Ct) 

2 

xCii (© + CI Cat) | n 3] x 


(5.49) 


This defines the mgf of a multivariate distribution that is equal to the 
product of the mgf of the multivariate normal and a function of two 
quadratic forms in t depending on the two kurtosis parameters k;, i = 


126 Multivariate Generalizations and Related Distributions 


1, 2, and on the elements of the normal covariance matrix. Setting t?) = 
0 into (5.49), we see that KX“) has an A-dimensional elliptical distribution 
with zero means, covariance matrix R11, and kurtosis parameter «x; (as 
it should). If either t® = 0 or t® = 0 and t® = 0, then M(t) 
becomes a function of three different. forms. 


5.14 Inverted Dirichlet Distribution 


There is a close connection between the multivariate ¢ distribution de- 
fined by (1.1) and the inverted Dirichlet distribution (Cornish, 1954; 
Dunnett and Sobel, 1954). To see this, consider the central p-variate t 
distribution with the pdf 


T ((v + p)/2) h + ls aaa 
(nv)P/2T (v/2) {RI 
Upon transforming to the canonical variables Z = (Z1, ..., Zp), Z = 


PX, where P is a p x p matrix such that PTP = R7!, it is easily seen 
that, 


f(x) = 


V 


-(v+p)/2 
TAW tp)/2) | +z . (5.50) 


f@) = renl to 


V 
In (5.50) now perform a further transformation T; = Z?/v, which is 
one-to-one in each of 2? regions with the Jacobian 


[J] = Puti tp. 
Consequently, the joint pdf of TT = (Tis. , Tp) becomes 
—(v+p)/2 
F ((v +p)/2) -1/2 4-1 ~ 
t) = agar tad fal ti ; 
f(t) rPI (y/2) + R +2 


which is the inverted p-dimensional Dirichlet distribution D'(1/2, ..., 
1/2; v/2); see, for example, Kotz et al. (2000, Chapter 49). 


6 
Probability Integrals 


There has been a very substantial amount of research carried out on 
probability integrals of multivariate ¢ distributions. Most of the work 
was done during the pre-computer era, but recently several computer 
programs have been written to evaluate probability integrals. 

Sections 6.1 to 6.7 by now may have lost some of their usefulness but 
are still of substantial historical interest in addition to their mathemati- 
cal value. We have decided to record these results in some detail in this 
book in spite of the fact that some of the expressions are quite lengthy 
and cumbersome. Sections 6.8 to 6.13 contain more practically relevant 
and modern results. 


6.1 Dunnett and Sobel’s Probability Integrals 


One of the earliest results on probability integrals is that due to Dunnett 
and Sobel (1954). Let (X1, X2) have the central bivariate ¢ distribution 
with degrees of freedom v and the equicorrelation structure rj; = p, 
i Æ j. The corresponding bivariate pdf is 


1 T? + z} — 2px, T2 er" 
T1, T2; V, = — = +S 6.1 
femino = zp {1+ BSE e 


with the probability integral 
y2 yı 
P (y1,Y2;v,p) = f f (z1, £2; v, p)dzıdz2. (6.2) 
—00 4—00 


Let 


(yı — py)? 


z (M, 41, Y2) lyi — py)? + (1 — p?)(m + y2) 
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and let 


z(m,y1,y2) T(a +b) id ae 
Ismu) (ab) = Í Tarib)! (1— y) dy 


denote the incomplete beta function. Dunnett and Sobel (1954) evalu- 
ated exact expressions for (6.2) when v takes on positive integer values. 
For even v and odd v, they obtained 


y1- 


Tj - 1/2) 2) ae 
ee ye r(j) (1 K 
x f + sgn(yi — py2)lae(vy ya) (5.3 7 N 
v/2 1/2-3 
Aa na 
ie aM (1+ 4) 


j= “i 
x j + sen(y2 = pth Malviya) (5.3 = z) 
(6.3) 


1 
P(yi, yz; v, p) = zp retan 


and 


(v—1)/2 : 
yi Tj) +4 
4yor 4 TG+1/2) 


(6.4) 
respectively. Here, 


a = yt ye, 


B = yiyot pv, 
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y = NYy2-y, 


ô = yt — 2pyryo +y3 + v(1 — p?). 


In the special case yı = yz = 0, both (6.3) and (6.4) reduce to the neat 
expression 


and 


1- p? 
P(0,0;v,p) = arctan ~———., (6.5) 


which is independent of v and is therefore identical with the correspond- 
ing result for the bivariate normal integral. Since the number of terms 
in (6.3) and (6.4) increases with v, the usefulness of these expressions 
is confined to small values of v. Dunnett and Sobel (1954) also derived 
an asymptotic expansion in powers of 1/v, the first few terms of which 
yield a good approximation to the probability integral even for mod- 
erately small values of v. The method of derivation is essentially the 
same as that used by Fisher (1925) to approximate the probability in- 
tegral of the univariate Student’s ¢ distribution: Express the difference 
f (a1, 22; v, p)—f (z1, 29; 00, p) as a power series in 1/y and then integrate 
this series term by term over the desired region of integration. Setting 


2 _ Yi —2pyrye +y 
ro S B ; 


Dunnett and Sobel obtained 
f(y yz vp) = 1+ (5-7) N (5 - +r) t 


f(y1, Y2; ©, p) 
a PA idee £ 13r  \ 1 
384 9% 4A 
ri6 ri4 17r}2 7710 g\ l 
+{(——-—+ — —— 
6144 128° 144 120 
= 14+D(r), 


say. Thus, the desired probability integral is 
y2 yı 
Ply yz v, p) = f f (21, z2; 0, p)dzı dz2 
=a J-0o 


y2 yı 
+f D(r) f (z1, £2; 00, p)dzıdz2. (6.6) 
—oo J — o0 


The first term on the right-hand side of (6.6) is the integral of the bi- 
variate normal pdf, and it has been tabulated by Pearson (1931) with 
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a series of correction terms. The second term can be integrated term 
by term to obtain an asymptotic expansion in powers of 1/v. Dunnett 
and Sobel gave expressions for the coefficients A, of the terms 1/v* for 
k = 1,2,3,4. The first of these coefficients takes the form 


A, = 2 4(a)6(u2) + Hoan) - BAY 6 an) Ba) 


4 
-HWD 5) 00, 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution, and 


= YLT PU 
1- p? 
ER yz — py 
Vie 


In the special case yı = y2 = y, (6.6) reduces to 


P(y,y;¥,p) = f l: f (z1, £2; 00, p)dzıdz2 


4 Aj Az Á3 Ag 
yw ps yA 
with the first two coefficients A; and Az now taking the forms 


A = -1e YEU) tly? + 1) Bley) -ya (cv)} 
and 
A, = seu (3y° — 7y* — 5y° — 3) B(cy) 
—y®' (cy) [By* (ct + 3c? +3) - y? (è +5) - 3] }, 


where c = y(1 — p)/(1 + p). In this special case, Dunnett and Sobel 
(1954) tabulated numerical values of the coefficients A; for selected val- 


ues of p, y, and v. The following table gives the values for p = 0.5 


6.2 Gupta and Sobel’s Probability Integrals 


Coefficients of the asymptotic expansion (6.7) for p = 0.5 


id 


y 


0.25 
0.50 
0.75 
1.00 
1.25 
1.50 
1.75 
2.00 
2.25 
2.50 
3.00 18 


=. ONDDOA A 


=. m 
bo © 


Ay 


-0.025870 
-0.057784 
-0.100016 
-0.150182 
-0.198378 
-0.231628 
-0.240531 
-0.223682 
-0.187525 
-0.142571 
-0.062685 


Ag 


0.003371 
0.008999 
0.021983 
0.047374 
0.079687 
0.096254 
0.067469 
-0.020268 
-0.149011 
-0.276255 
-0.376815 


A3 


0.003816 
0.006868 
0.006891 
-0.006835 
-0.033130 
-0.038696 
0.052274 
0.293449 
0.623867 
0.858993 
0.432592 


Ag 


-0.001050 
-0.002155 
-0.001879 
0.007991 
0.036817 
0.032808 
-0.191482 
-0.819219 
-1.618705 
-1.765249 
2.236773 
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These values can be used to construct tables for the probability integral 


in (6.7). 


6.2 Gupta and Sobel’s Probability Integrals 


Gupta and Sobel (1957) investigated the special case when X follows 
the central p-variate t distribution with degrees of freedom v and the 
correlation structure r;; = p = 1/2, i Æ j. Tf Yi, Y2,...,Yn,Y are 
independent normal random variables with common mean and common 
variance g°, and if vS*/o? is a chi-squared random variable with degrees 
of freedom v, independent of Y1, ¥2,..., Yn, Y , then one can rewrite the 
probability integral as 


d d 
J f f (z1, .--., £p; V, p)d£p' ++ dary 
—0o —oo 
E: ra ae +45 
prf ee } < vial 


P(d) 


where M, = max(Yj, Y2,... 


Pr (== < v24) 


S 


Pr (Z < v2d), 


(6.8) 


,Yp) and Z = (Mp — Y)/S. Gupta and 
Sobel (1957) provided four useful expressions for P(d). These are by now 
classical results applicable in statistical inference. The first expression 
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is derived by fixing Y and S in (6.8) and integrating with respect to M, 


P(d) = L h(s) i $P (y)d (, — Vias) J ds, (6.9) 


where ¢ and ® are, respectively, the pdf and cdf of the standard normal 
distribution and h is the pdf of the chi-squared distribution with v de- 
grees of freedom. Based on the fact that the pdf ¢ admits an expansion 
about d = 0, it easy to justify a term-by-term integration of (6.9) to 
obtain the second expression 


CO ok/2 ak : 
P(d) = E AES Hh (Pn), 
p+ 1 i20 k! oO 
where 
S k 

Ar { ($) }= ie aA (6.10) 
(3) 3) 

is the kth moment of x, / vY (provided that k > —v) and Hy, is the kth 


Hermite polynomial defined by 


(aa (-5) = Hy,(x)exp (-=). (6.11) 


A third expression for P(d) is derived by first expanding ¢ about S =o 
and then integrating term by term, obtaining 


OU vd) — a9 (ov) = (5) 
+d¢ (y— V2d) E (2 m 1) i ja 
[2o (u- 3d) ay 
-V3a(1- As) f” (y- Vd) 8°(u)0 (y - Va) ay 


P(d) 


+2d? (1 -a f fy? — 2/2dy + 2d? — i} 
x BP(y)p (v- v2d) dy +-->, 


where A, is given by (6.10). Each of the integrals above can be evaluated 
by expanding the pdf ¢ about d = 0, as was done in (6.9). The fourth 
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and final expression for P(d) given by Gupta and Sobel (1957) uses 
the result of Seal (1954) that the distribution of D = (Mp — Y)/o is 
asymptotically normal as p tends to infinity. It follows directis from 
Seal’s result that the third and higher central moments of D tend to 
the corresponding moments of the standard normal distribution. Since 
the coefficients involving v in A_; in (6.10) tend to unity as v > œ, it 
follows that the third and higher central moments of Z = (Mp — Y)/S 
tend to the corresponding moments of the standard normal distribution 
as both v and p tend to infinity. It is therefore reasonable to approximate 
the distribution of W = (Z — E(Z))/,/Var(Z) by a Gram-Charlier 
expansion in the Edgeworth form, where 


E(Z) = A_1@p,1 
and 
Var(Z) = A-2(ap2+1)- (A-1ap,1)” j 


Here, ap, denotes the ith moment of the largest of p independent stan- 
dard normal random variables. Using equation (17.7.3) of Cramér (1951) 
and letting ds = (V2d — E(Z))/,/Var(Z), Gupta and Sobel obtained 


P(d) = Pr (Z < v2a) 
= B(d,) - FA” (ds) 
+6 (dy) + 36 (a) 
35 

-F (4) (d) — ee (a ds) — Ta (8) (d,) +, 

where 
ee | 
Qk = Jrz (6.12) 


is the kth standardized cumulant of Z obtained from the moments 
around the origin. 

In a related development, Gupta (1963) studied the above case p = 1/2 
and showed that P(d) = P(d; v) satisfies 


dP (d; v) 
dd 


which is Hartley’s differential-difference equation for the probability in- 
tegral of a general class of statistics known as Studentized statistics. 


+v {P(d;v)-— P(dq,v +2)} = 0, (6.13) 
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Using Hartley’s solution (obtained using the theory of characteristics), 
Gupta obtained an approximation for P(d;v) in powers of 1/y and re- 
marked that it can be computed by using the Gauss-Hermite quadrature. 
Gupta et al. (1985) extended this result for any p > 0 and showed that 
P(d) satisfies (6.13) in this case too. In this case the approximation for 
P(d) in powers of 1/v is 


P(d) = Gld.) + Y` Leld, (6.14) 
k=1 


where Lp is the kth correction term and G is the joint cdf of a p-variate 
normal distribution with zero means, common variance g?, and the 
equicorrelation structure r;; = p, i # j. Letting G) (d) denote the 
kth-order derivative of G(d,...,d) with respect to d, the first four cor- 
rection terms can be written as 


1 
= HD-a 
L,(d) at a \, 
1 
L:(d) = ao 32) - 10a) + 9a - 2a}, 
I3(d) = + {4 — 7a) +170 — 17a + 6a) 
3 6v3 ? 
and 
1 
L(d) = A f15a® — 180a + 8300 — 18480) + 20150 
—900a®) + 20a) + 48a®}, 
where 


1 
at) = rr, k=1,2,...,8 


and the first eight y% (d) are 


gd) = dEM(d), 

gd) = #G°d) +dG(d), 

pd) = PGS (d) +3?G(d + dE (d), 

yp) (d) dG (d) + 682G® (d) + 7G (d) + dG (d), 

p) (d) BG (d) + 10d*G™ (d) + 25G (d) + 15G (d) 
+dG) (d), 

p® (d) = dG (d) + 15d°G® (d) + 65d4*G™ (d) + 908G (d) 
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+31 G® (d) + dG (d), 

yd) = d’GO(d) +21d°G(d) + 140dG®) (d) + 350d4*G™ (d) 
+301G®) (d) + 63G (d) + dG) (d), 

y®(d) = G®(d) + 28d’G (d) + 266d°G (d) + 1050d°G) (d) 
+1701d'G™ (d) + 966a°G®) (d) + 127d2G) (d) 
+dG™)(d). (6.15) 


Thus the evaluation of P(d) in (6.14) involves that of G(*) for k = 
0,1,...,8, and we shall discuss in Chapter 8 how the latter can be 
performed. 


6.3 John’s Probability Integrals 


John (1961) provided alternative formulas for the evaluation of the prob- 
ability integral. Although the method is discussed in detail only for the 
bivariate case, it has wider applicability in the sense that it can be 
adopted to obtain the probability integral of the multivariate t distribu- 
tion for any dimension. 

Let X be a p-variate vector having the central ¢ distribution with 
degrees of freedom v and correlation matrix R. Using the definition 
that X can be represented as (Z1/S, Z2/5,...,;Zp/S), where Z is a p- 
variate normal random vector with correlation matrix R. and v$?/o? is 
an independent chi-squared random variable with degrees of freedom vy, 
one can show that the characteristic function of X is 


E (exp (it’X)) = E(E (exp(it?Z/s |S = s)) 
1 = v{2—-1 A = 
= a. gl? exp (-z - at R tt) dz. 


In the case p = 2 with the equicorrelation structure r;; = p, i Æ j, the 
above expression reduces to 


1 A] VPN? ; 
. . a v/2-1 cae Ae izi 
E (exp (itıXı + it2X2)) = Tw/2) | T f J il ( =) as} 


i=0 


v 
x exp { -z - 7E (tf + #3) } dz. 


By the inversion theorem, John (1961) derived the corresponding joint 
pdf as an infinite series of one-dimensional integrals. Integrating the 
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infinite series term by term, the probability integral becomes 


1 É 
P (yi, Y2; v, p) = wo (Y1s¥2) + = J Yvi (Usd), 
i=1 ` 
where 
1 CO 
yvo(¥i,¥2) = ners f aY/?—! exp(—z) 
0 
x® 2r4y1 ® 2ry2 dz 
vo v 
and 


Yvi (Y Y2) = wD jee exp |- {1 + (ut +} 
x Aj (4) Hes (42) dx 


for? =1,2,.... Here, ®(-) is the cdf of the standard normal distribution 
and H, denotes the Hermite polynomial of order k defined by (6.11). 
John provided explicit algebraic expressions for y,; for i = 1,2,...,6. 
The first three of them are 


yi (yny) = 27%, 
yv.2(y1,y2) = yiya PPD, 
and 
2 = z a 
Yna (Y y2) = (1+2) yiyge +2) — (y? + yf) VPED 4 2, 


where z = (y? + y3)/v +1. In principle, explicit expressions for y,,; can 
be obtained for any 7 > 1. To evaluate y, o, the integration has to be 
done numerically. John tabulated values of this quantity for v = 11,12 
using Gauss’ formula for a numerical quadrature (Kopal, 1955, page 
371). He also provided several useful recursion relations. For example, 
values of y,,o(y1,y2) for yı negative or y2 negative or both negative can 
be found from the formulas 


Yvo (yi,¥2) = T, (y2) — yro (Y1: 42); 


Yvo (Y2) = Ts (y1) — Yvo (Y1; —42), 
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and 


Yvo (Yyy) = 1+yv0(-y1,—-y2) — Ty (—y1) — Ty (—y2), 


where T, is the cdf of the Student’s ¢ distribution with v degrees of 
freedom. 


6.4 Amos and Bulgren’s Probability Integrals 


In a widely quoted paper, Amos and Bulgren (1969) derived several 
representations for (6.2) in terms of series and simple one-dimensional 
quadratures, together with efficient computational procedures for the 
special functions used in their numerical evaluation. One of the quadra- 
ture formulas given is 


1 
Qn(v +1)(1 +7? + ¥2)"/2 


x if oF, (1 53 SENI — c? cos? (0 — é)) dô 
= e+» 
Val (y/2)(1 +7 +93)" 
> a I {cos(@ — $) < 0} cos(6 — ¢) 
o {1 - 2 cos?(0 — gy}? 


where >F, is the Gauss hypergeometric function, J{} is the indicator 


function, 
_ “+H 
Cra 2 ae? 
1+ ty 


P = 


, 


Ài 
n = (tml) 
A 
v = m-ni 
I 
EE N ERA 
Lap 


fl 
02 = Ren i 
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ps arctan (y2/71) , ify, > 0, 
mw + arctan (y2/n), ify <0, 


1 
Àl = Sa 
1+p 
and 
1 
à = —. 
1-p 


One of the series formulas given is 


c) P(x + k)/2) 


EEE ag E e ie) 
2AT) A 1+ 73 + AA T k)/2) 
62 
: f cos* (0 — ¢)d8. (6.16) 
41 
For the special case v = 1, P can be reduced to the closed-form expres- 
sion 
1 2v 2, 2 
PiS arctan (=) +I {u +v’ < 1}, 
where 
i 2r sin ġ 
~ A(1+r?+2rcosġ)’ 
Ti l-r? 
~ A(1+r?+2rcosġ)’ 
fen VIS 2 
I+VII ER 
and 


= 6. — T 
A = tan ( 5 | 


If in addition p = 0, then the expression for P reduces further to 


1 
P = — {estan (m) + arctan yı + arctan y2 + z) ; 


2r Jitu +y 3 


The advantage of these expressions over the ones given by Dunnett and 
Sobel (1954) is that these are easier to compute, especially for large 
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degrees of freedom. For instance, the integral in 8 in (6.16) can be ex- 
pressed in terms of incomplete beta functions that are extensively tab- 
ulated. Amos and Bulgren (1969) numerically evaluated values of P for 
all combinations of p = —0.9, —0.5, 0, 0.5,0.9 and v = 1, 2,5, 10, 25, 50. 


6.5 Steffens’ Noncentral Probabilities 
Consider the p-variate noncentral t distribution defined in (5.1). Moti- 
vated by the Studentized maximum and minimum modulus tests, Stef- 
fens (1970) studied the particular case for p = 2 and R = I). In this 
case, the joint pdf (5.1) reduces to 


o0 


2 2 oo 
Flees) = exp (848) 2 ry eee) 


2 nT (v/2) == kll! (b+) /2+1 


(v+k+1+2)/2 


x (Vian) (Vez) (142+ 2) 


where €; = u; /o are the noncentrality parameters and v denotes the de- 
grees of freedom. The testing procedures involve maximum or minimum 
values of the components X, and X2 and the computation of the cor- 
responding probabilities. For this reason, Steffens (1970) derived series 
representations for probabilities of the form P) = Pr(| Xi |< A,| X2 |< 
A) and P, = Pr(| X |> A,| X2 > A). It is seen that 


_ G&+& 7/2)" (63/2) 
P, = zep (E P 
n/4 
x i (sin?* v cos” v + sin” v cos”* v) 
0 
xIa(k+1+1, 5) dv 
and 
i grga 8/2" (8/2) 
BAe (A) > eae a 


n/4 
x | (sin?* v cos” v + sin” v cos** v) 
0 


x {1 -Ip (k+1+1,5)} do, 


where Tẹ denotes the incomplete beta function ratio, a = A? sec? v/(v + 
A? sec? v), and 8 = A*cosec?u/(v + A®cosec?v). Using these represen- 
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tations, Steffens estimated values of the critical points A for all com- 
binations of v = 1,2,5,10,20,50,00 and &,£ = 0(1)5 for the signif- 
icance level 0.05. In a more recent development, Bohrer et al. (1982) 
developed a flexible algorithm to compute probabilities of the form 
Pr(eyr < Xp < €21,...,C1p < Xp < Cop) associated with the noncen- 
tral p-variate distribution (5.1). 


6.6 Dutt’s Probability Integrals 


Dutt (1975) obtained a Fourier transform representation for the proba- 
bility integral of a central p-variate ¢ distribution with degrees of freedom 
v and correlation matrix R 


yı Yp 
PY iso Up) = f ot! Z,+-+,Lp;V)dx,y--- dx). (6.17) 
Using the definition of multivariate t, one can rewrite (6.17) as 
P(y1,---,Yp) =o, a 2” exp (—z?/2) G(hi,..., Gp) dz, 
2¢/2T(v/2) Jo 


(6.18) 


where k = yxz//v, k = 1,...,p and G is the joint cdf of the multivari- 
ate normal distribution with zero means and correlation matrix R. In 
the case y = 0, one has P independent of v and 


P (yi,-.-,Yp) = G(0,...,0). 


Explicit forms of G for p = 2, 3, 4 in terms of the D-functions are given in 
Dutt (1973). The D-functions are integral forms over (—0o, 00) defined 
by 


|i*| oo oe) di, 
Dg (t1,.--, tp; R) = oar | sane Sı 
—00 —0o 


k k 
x exp (Zes — Soe 2) dsk `- ds1, 
1=0 1=0 
where the first five d are 
dist, 
dg = di2, 
dz = dy2413423 — (di2 + diz + dos) , 
d4 = —Gy2413423414424434 + di2413423 + d124+14+24 + di3414434 
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+d23424434 — (diz + diz + do3 + dig + doa + d34), 
ds = —d12413+23+24+34+15+25+35+45 + 12413423414424434 

+12413423415425435 + d12+14+24+15+25+45 

+di3+14+34+15+35+45 + d23+24+34+25+35+45 

= (di2413423 + di2+14+24 + d124+15+25 + di3414434 

+d13415435 + d144+15+45 + do3424434 + do3425435 


+do4425445 + d34435445) + dio + di3 +++: + dys, 


and 


dpiqit--+pmam = 1l- exp {- (rpq Sp: Sq He +8 pmam Spm Sam) Y- 


Using the notation 
Drijine = Dk {thse oti R (tis oti) 


where R(t;,,...,t,,) is the correlation matrix based on the subscripts 
jı, --- Jk, Dutt (1973) provided the following explicit forms for G 


G(ti,t2) = {1—(t,)} {1— © (t2)} + Dae, 


G (t1, ta, tz) = {1— ®(t,)} {1 — ®(t2)} {1 — & (t3)} 
+ {1 — & (t))} D2:2,3 + {1 — © (t2)} Dons 
+ {1 — @ (t3)} D2:1,2 + D3:1,2,3; 


and 


4 
G (ti, ta,ts,t4) = |] {1-®(te)} + {1- &(t:)} {1 — 8 (t2)} Dosa 


k=l 
+ {1 — @(t1)} {1 — ® (ts)} D2:2,4 
+ {1 — ® (t2)} {1 — ® (t3)} Dz:1,4 
+ {1— @(t:)} {1 — © (t4)} Do.2,3 
+ {1 — ® (t2)} {1 — ® (t4)} D2:1,3 
+ {1 — @(t3)} {1 — ® (t4)} Done 
+ {1 — © (t)} D3:2,3,4 {1 — © (t2)} D3:1,3,4 
+ {1 — È (t3)} D3.1j2,4 + {1 — © (t4)} D3:1,2,3 
+Da:1,2,3,4- 


A much simplified representation for G in terms of the error function, 
erf(-), and integral forms over (0,00), denoted as the D* functions, is 
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given in a later paper by Dutt (1975). These D*-functions are defined 
by 


ee ee E 
k (tis---tp; R) s, (27)F 5 ds 0o S1°''Sk 


k 
x exp (- ya /2) ds, --- ds}, (6.19) 
i=0 


where for the first few k are 


dă = sin(tisı), 
d, = 
2 = €—12COS1—2 — €12 COS1 +2, 
k 
d3 = €12+13+23+14+24+34 COS1424344 


+e12—13—23—14—24+34 COS—1~2+3+4 
+e—12+13—23—14+24—34 COS_142-344 
+e—12—13+23+14—24—34 COS1—2—3+4 
—E—12—13+23—14+24+34 COS_14243+44 
—€~12+413-23414—24434 COS1—2+3+4 
—€12-13~23414+4+24—34 COS1+2+3+4 


—€12413+23-14—24—34 COS1+2+3—4 


and for notation 


Cprarttpmam = EXP È- (pia Spı Saa +3 + Tpm am Spm Sam) J > 
SiINp,+--+pm = SİN (tp Sp, +t + tpm Spm) > 
COSpi+:+pm = COS (tpi Sp, +++ + pm Spm): 


(A negative sign on the index pıqı corresponds to +1p, 9, Spı Sq and —P1 
corresponds to —tp; Sp, -) Important special cases of these functions are 


1 yY 
D* = -erf{—}], 
D3 (0,0;R) = Bs arcsin (r12) 
2 Vs = on 12); 
and 
D,(0;R) = 0, for k odd. 


Using the abbreviation that 


De tae = D% {tj,,--- ty R (tjs -3 tjr) Jo 
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Dutt (1975) provided the following representation for G 


1 p 1 p—l1 P 1 p-2 p 
G (tisto) = G) os G) X Dia + (5) X Diu 
k=1 k<l=1 
1 p-3 Pp 
+ G) ye D3:kim Feit Dp:i,...,p- 
k<l<m=1 

Hence, by (6.18), the computation of P in (6.17) can be achieved by 
successive applications of the Gauss-Hermite quadrature formula using 
only positive Hermite zeros (Abramowitz and Stegun, 1964, page 924). 
There are several advantages for this approach. First, it is not necessary 
to invert the correlation matrix. In addition, (6.19) permits the use of 
Gauss quadrature formula that are remarkably effective in estimating 
the value of an integral from a few points, provided that the integral 
excluding the weighting function can be accurately approximated by a 
polynomial. Moreover, often the integrand separates as a product of two 
functions, one depending only on correlation coefficients and the other 
on the original limits of integration. 

For selected correlation structures and several values of v and y = yz, 
k = 1,...,p, Dutt (1975) computed values of P accurate up to six 
decimal places. 


6.7 Amos’ Probability Integral 


For the equicorrelation structure rj; = p, 7 # j considered by Gupta and 
Sobel (1957) and Gupta (1963) — but with the common p taken to be 
any positive real number less than 1 — Amos (1978) derived the following 
simpler expression for the probability integral 


_ XAT ((v +1)/2) [” _ dz? 
P(d) = Fat Ry? i: exp ( 9 ) 


CE 
x 6?(x)erfc | -—= | dz, 6.20 
ete ( a) oe) 
where erfc(-) is the complementary error function defined by 


erfe(z) = =f exp (—z”) dz 
VT Jz 
and a, b, c, d are constants given by 


Lp 


a= — 


p 


3 
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hp = OE 
VIZA 

ab 
V1+ 8’ 


a? 


1+62° 


The reduction to (6.20) was obtained by means of a relationship between 
the parabolic cylinder function and the complementary error function. 
Amos (1978) suggested computing the integral (6.20) by locating the 
zo for which the derivative of the integrand is zero and then summing 
quadratures on intervals of length h to the left and right of xp until a 
limit of integration is reached or the truncation error is small enough. 
The motivation for this procedure comes from the fact that zo can vary 
widely with extreme parameter values, and h, which estimates the spread 
of the integrand, can be small or large. Thus, x9 and h accommodate 
the parameters, producing meaningful results by preventing quadratures 
over tails that are negligible or preventing gross misjudgments of the 
scale of integration. Letting g(x) denote the integrand of (6.20), Amos 
(1978) showed that the derivative of log g(x) decreases monotonically 
from oo to —o0 as z traverses (—00, 00), guaranteeing a unique root £o 
of g'(x) = 0. 


6.8 Fujikoshi’s Probability Integrals 


Fujikoshi (1988) provided asymptotic expansions as well as error bounds 
for the probability integral (6.17) when the correlation matrix R = Ip, 
the p x p identity matrix. Specifically, letting 


di F E 
Qô, j (Y1, ----Yp) = dsi {e (s Hy) a (s En 
where 6 = —1, 1, and ® denotes the cdf of the standard normal distri- 


bution, Fujikoshi established the following approximation for the prob- 
ability integral 


, 
s=1 


k-1 
1 

Pith einaey = P (ys) (Up) + D459 (Wa --¥e) 
jail 


a(-y ay 
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which we shall denote by A;,4(y1,---,Yp). Fujikoshi also derived uni- 
form and nonuniform error bounds for this approximation. Under the 
assumptions that 


ās = supļ|ask (y1,--.,Y¥p)| < 00, 
y 


PCF) fem CG) Jee 


the uniform bound takes the form 


and 


sup |P (41,---Y¥p) — Ase (yr,--+s Yp)l 
y 
2 k 
EG- 
v x2 
G5n(l) = up (1+ Ily II!) lase (y1,---.¥p)| < 00 


(e A 


the nonuniform bound takes the form 


1 
< gE 


Under the assumptions that 


and 


|P (Y1, ---,Yp) — Abe (Y1, ---; Yp) 
1 £ 2\ 1/2 
< gly) ‘auoe ($) x 


k 
+ 


2 
Xv _4 2 
v v 2 
Vv 


J 
Clearly the latter bounds are improvements on the uniform bounds in 
the tail part of the multivariate ¢ distribution. In the case p = 1, these 
results provide useful approximations for the univariate Student’s ¢ dis- 
tribution — see Fujikoshi (1987) and Fujikoshi and Shimizu (1989). The 
special case of (6.21) for y; = y has been investigated more recently by 
Fujikoshi (1988, 1989, 1993), Fujikoshi and Shimizu (1990), and Shimizu 
and Fujikoshi (1997). 


6.9 Probabilities of Cone 


Consider the p-dimensional set 


Al) = {xs zx <r ||z lall zin E(o)}, (6.22) 
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A,(c) 


E(c) and {Ilzll=r} 


24 


Fig. 6.1. The sets A,(c) and E(c) A {|| z |= r} in two dimensions 


where E(c) = {z : z > c || z ||}, || z |= vzTz, and c is a nonnegative 
constant. The set E(c) is the cone, with vertex at the origin, which 
intersects origin-centered spheres in spherical caps. This is illustrated in 
Figure 6.1 for p = 2. 

Bohrer (1973) studied the analytical shape of A,(c) and the associated 
probability 


pler, p,v) = Pr(X € A,(c)) 


when X has the p-variate t distribution with mean vector 0, covariance 
matrix o7I,, and degrees of freedom v. The evaluation of p(c,r, p, v) is 
of statistical interest and use in the construction of confidence bounds 
(Wynn and Bloomfield, 1971, Section 3; Bohrer and Francis, 1972, equa- 
tion (2.3)) and in testing multivariate hypotheses (Kudô, 1963, Theorem 
3.1, Section 5; Barlow et al., 1972, pages 136ff, 177). 
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As regards the shape, Bohrer showed that every two-dimensional sec- 
tion of A, containing the z,-axis is exactly the two-dimensional version 
of A, illustrated in Figure 6.1. Thus, A, is the solid of revolution about 
the z,-axis that is swept out by the A, in Figure 6.1. To express this 
more precisely in mathematical terms — for an p x 1 vector v — define 
polar coordinates Ry and py = {6:}, with -r < Oy; < m, by 


vu = Ry, cosy, 


i-1 


v = Ry cosh; If sin ĝyj, 


j=1 
7=2,...,p—1, 
and 
i-1 
vp = Ry] [ sin. 
j=1 
Also define 
6* = arccose, 
Tr = {x: || <0", Re <r}, 
To = {x: bu —6* € (0,2/2], Ry cos (60r — 6*) <r}, 
Tz; = {x: 61 +0* € [—7/2,0), Ry cos (1 + O*) <r}, 
and 
TI, = {x : [bul] > 0" + 7/2}. 


Then the set A, is the union of the disjoint sets T,,...,74. As regards 
evaluating the probability p(c, r, p, v), Bohrer (1973) derived the follow- 
ing expression 


O RØ) r? k (1/2 — 0*) 
plc,r,p,v) = TED (Fe < =) + e/a) 
ae j+1 p-j-1)\ (p-2 
KG 2 4S. 2 ee 


cone 2 
xo! (1- 7)” a (Aaa < =) , 
oO 
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where k(0) is given by 


2™m!)? in?” 
TEE ( r stehe cos ĝ sin“™ 0 
X sin2(m—) saat jy p-— 2j 

= p-—2l ra al ea 
when p = 2m +1 is odd and by 


(p—1)!9  _ sinP™?—?™ 9 cos0 
22-1 (m — 1)!m! p 


S sin?-!-! A cosð a7 p +1 -2j 


p—2l ja PH 2-25 


k(6) = 


when p = 2m is even. The statistical questions that motivate this work 
ask what radius r is required so that p(c,r,p,v) = a for preassigned 
values of a. For p < 5, Bohrer (1973) provided tables of these percentiles 
for a = 0.95 and 0.99 and for a range of (c, v) pairs. 


6.10 Probabilities of Convex Polyhedra 


It is well known (Nicholson, 1943; Cadwell, 1951; Owen, 1956) that 
probabilities of polygons under bivariate normal distributions can be 
evaluated in terms of probabilities of right-angled triangles with vertices 
(0,0), (yi,0), (yr, y2), yj > 0, j = 1,2 under bivariate normal distri- 
butions with zero correlation. John (1964) proved an analogous result 
that probabilities of polygonal and angular regions for a given bivariate 
t distribution can be expressed in terms of V, (y1, y2), the integral of 


T((v + 2)/2) 


f (1,223) = e2 fi 


r? +r? FER) 
A 


over the right-angled triangles with vertices (0,0), (y1,0), and (y1, y2). 
John (1964) also provided several formulas for evaluating V, (y1, y2). A 
formula in terms of the incomplete beta function is 


1 
Vp (y1,y2) = pP arctan (2) 


në 11 


k v 
J c" By (zrehi) (623) 
An Jv Hy? a 2 22 
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where 
= v 
vty?’ 
= v+ y? 
v +y ty 
and 


1 
B,(a,b) = [wa wide 


is the incomplete beta function. This series converges slowly unless yı 
is large in relation to v. In the two cases y odd and v even, (6.23) can 
be reduced considerably. If v = 2m for a positive integer m, then 


VIe 11 
Vom (¥1,Yy2) = any ee ye cB, (i + z 5) (6.24) 
k=0 


while if v = 2m + 1 for a nonnegative integer m, then 


1 Yo 1 1 1 
Vom+1 (Yi, Y2) = zp arctan (2) — qP” (G 3 
c(l—c) 1 
+ — Di eB, (k+1,5 } , (6.28) 
k=0 
where 


v (v +y? +y) 
v(v +y? + y3) tyy 


v = 


An attractive feature of (6.24) and (6.25) is that, when utilizing them for 
evaluating Vom and Vom41, they are already evaluated for lower values 
of m also. If one performs the summations in the order indicated in the 
formulas, the addition of each term will yield values of Vom or Vom+1 for 
the next higher value of m. This feature makes it particularly suitable 
for use in preparing tables. 

A second formula for V, (yi, y2) given in John (1964) is an expansion 
in powers of 1/v 

2 © ik 

i » Un (y1, 2) 5 


k=1 


V, (yi, Y2) = Væ (y1,y2) z 


(6.26) 
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where the first three Ug are given by 


4 
UY, (y1, y2) = “w, (y1, y2) , 


1 y? 
U> (y1 y2) = Bi = zm (yi, Y2) + TUL (y1, y2) \ 
and 
_ 3)3 y? yi 
; = 1 ; ; ; ; 
Us (y1, ya) y 1 (yi, y2) -4 (yi, Y2) + 64 (y1: Y2) 


where 


y2/Y1 A y?t? 
W, (m,y2) = I (1+) exp (-4 ) dt. 
0 


The term Væ in (6.26) is the integral of exp{—(y? + y3)}/(27) over the 
right-angled triangle with vertices (0,0), (y1,0), and (0, y2). The method 
of derivation for (6.26) is similar to the classical method employed by 
Fisher (1925) for expanding the probability integral of Student’s t. De- 
spite the complexity of (6.26) over (6.23), (6.26) should be preferred if 
v is sufficiently large. The first two or three terms of (6.26) then can be 
expected to provide fairly accurate values of V,. 

John (1964) also provided a recurrence relation and an approximation 
for V,(y1, y2); the latter proved to be satisfactory only when either v is 
too small or y2/yi is too large. In a subsequent paper, John (1966) 
extended this result to higher dimensions, by showing that the probabil- 
ities of the p-dimensional convex polyhedra with vertices (0, 0, 0, 0,..., 
0), (Y1, 0, 0, 0,..., 0), (yı, y2, 0, 0,..., 0), gs (yi, Y2, Y3, Yay 5 Yp)» 
hj > 0, 7 =1,2,...,p under a p-variate ¢ distribution with v degrees of 
freedom can be expressed in terms of the function V,(y1, Y2,- --, Yp), the 
integral of the p-variate t pdf 


PG eis Boe) 
T((v + p)/2) r? + a3 +--+ +22 PENA 
(vr)P/T(v/2) v 


over the same p-dimensional convex polyhedra. John also provided an 
important asymptotic expansion in powers of 1/v connecting V,(y1, Y2, 
-< Yp) with V(y1, Y2,-.-,Yp), the integral of the p-variate normal pdf 


f (21, %2,...,%p300) = (27)-?/? exp {— (xt +23 + 423) /2} 
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over the same polyhedra discussed above. Up to the order of the term 

O(1/v?), the expansion is 
V, (yi, Ye; KE Up) 

1 

4v 

-=y (1 +y?) f(y) V (Y2, ¥s;--+4p) } 


= V (Yi, Y2,- --,Yp) + {uiyaf (142) V (Ys, Yas: -s Yp) 


1 
+ga [3nvvsyaf (u Y2y3: y4) V (Y5, Y6,- -- Yp) 


—yiyoys (2 + 9y? + 6ys + 3y3) f (y1: Y2,y3) V (Ya, -- +5 Yp) 
—yiye (3 + 5y7 +y3 — 9yí — 9y7ys — 3v2) 

XV (y3,---5¥p) f (Y1:Y2) 
+n (3+ 5y? + Tuf — 3y) F (1) V (zs. } 


+0(Z). 


In this formula, V(Ym;Y¥m+1,---Yp) is to be replaced by 0 if m > p+2 
and by 1 if m = p+1. In principle, there is no difficulty in determining 
further terms of this expansion, but the coefficients of higher powers of 
(1/v) have rather complicated expressions. Other useful results given 
by John (1966) include recursion formulas connecting V, (41, y2, ---5Yp) 
with Vi42(1, 92, see Yp). 

More recently, several authors have looked into the problem of com- 
puting multivariate ¢ probabilities of the form 


P = [1na (6.27) 


where X has the central multivariate t distribution with correlation ma- 
trix R and A is any convex region. Somerville (1993a, 1993b, 1993c, 
1994) developed the first known procedures for evaluating P in (6.27). 
Let MM!’ be the Cholesky decomposition of R (where M is a lower 
triangular matrix) and set X = MW. Then W is multivariate t with 
correlation matrix Ip. If one further sets r? = WTW, then F = r?/p 
has the well known F distribution with degrees of freedom p and v. Let 
A be the region bounded by p hyperplanes and described by 


GW <d, 


where G = (gi,...,gp) and the jth hyperplane is g; W = dj. Fora 
random direction c, let r be the distance from the origin to the boundary 
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of A, that is, the smallest positive distance from the origin to the jth 
plane, j = 1,...,p. Then an unbiased estimate of the integral P in 
(6.27) is 


Pr (F <r?/p). (6.28) 


To implement the procedure, Somerville chose successive random direc- 
tions c and obtained corresponding estimates of (6.28). The value of P 
was then taken as the arithmetic mean of the individual estimates. 

Somerville (1997, 1998b) provided the following modification of the 
above procedure. Let r* be the minimum distance from the origin to 
the boundary of A, that is, the smallest of the r for all random directions 
c. Divide A into two regions, the portion inside the hypersphere of radius 
r* and centered at the origin, and the region outside. The probability 
content of the hypersphere is 


P = Pr (F <r*?/p), 


and this can be estimated as in Somerville (1993a, 1993b, 1993c, 1994). 
If E(v) and e(v), respectively, denote the cdf and the pdf of v = 1/r 
(the reciprocal distance from the origin from and to the boundary of A), 
then the probability content of the outer region is 


1/r* 
P, = f E(v)e(v)dv. 
0 
Since F = r?/p, the pdf of v is 


aT ((v+p)/2)_—_ v=? 
[P(v/2)P(p/2) (1 + vy2) t 


The strategy is to use some numerical method to estimate E(v) and then 
evaluate the integral P) using the Gauss-Legendre quadrature. The ap- 
proaches of Somerville (1997, 1998b) differ in that Somerville (1997) ap- 
plied Monte Carlo techniques to estimate E(v) while Somerville (1998b) 
used a binning procedure. It should be noted, however, that an approach 
similar to these had been introduced earlier by Deak (1990). 
Somerville (1999a) provided an extension of the above methodologies 
to evaluate P in (6.27) when A is an ellipsoidal region. This has po- 
tential applications in the field of reliability (in particular relating to 
the computation of the tolerance factor for multivariate normal popula- 
tions) and to the calculation of probabilities for linear combinations of 
central and noncentral chi-squared and F. In the coordinate system of 
the transformed variables W, assume, without loss of generality, that 


ev) = 
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the axes of the ellipsoid are parallel to the coordinate axes and the ellip- 
soid has the equation (w — u)? B~!(w — u) = 1, where B is a diagonal 
matrix with the ith element given by b;. If the ellipsoid contains the 
origin, then for each random direction c there is a unique distance r to 
the boundary. An unbiased estimate of P is then given by 


Pr (F <r?/p). 


If the ellipsoid does not contain the origin, then, for a random direction, 
a line from the origin in that direction either intersects the boundary of 
the ellipsoid at two points (say r > rą) or does not intersect it at all. If 
the line intersects the boundary, then an unbiased estimate of P is given 
by the difference 


Pr (F < r?/p) — Pr (F < r?/p). 


If the line does not intersect the ellipsoid, an unbiased estimate is 0. 
As in the first procedure described above, this is repeated for successive 
random directions c, each providing an unbiased estimate. The value 
of P is then taken as the arithmetic average. A modification of this 
procedure along the lines of Somerville (1997, 1998b) is described in 
Somerville (1999a). 

Somerville (1999b) provided an application of his methods for multiple 
testing and comparisons by taking A in (6.27) to be 


A = {x € RP :maxe?x < q/v3}, ce B, 


where B is the set of contrasts corresponding to the different hypotheses 
and q > 0. The purpose is to calculate the value of q for arbitrary 
R and v and arbitrary sets B such that the probability content of A 
has a preassigned value y. Somerville and Bretz (2001) have written 
two Fortran 90 programs (QBATCH4.FOR and QINTER4.FOR) and 
two SAS-IML programs (QBATCH4.SAS and QINTER4.SAS) for this 
purpose. QINTER4.FOR and QINTERA.SAS are interactive programs, 
while the other two are batch programs. A compiled version of the 
Fortran 90 programs that should run on any PC with Windows 95 or 
later can be found at 


http: //pegasus.cc.ucf.edu/~somervil/home. html 


These programs implement the methodology described above to evaluate 
the probability content of A (A Fortran 90 programs MVI3.FOR used to 
evaluate multivariate t integrals over any convex region is described in 
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Somerville (1998a). An extended Fortran 90 programs MVELPS.FOR to 
evaluate multivariate t integrals over any ellipsoidal regions is described 
in Somerville (2001). The average running times for the latter program 
range from 0.075 and 0.109 second for p = 2 and 3, respectively, to 
0.379 and 0.843 second for p = 10 and 20, respectively.). The so-called 
“Brent’s method,” an interactive procedure described in Press (1986), is 
used to solve for the value of g. The time to estimate the q values (with 
a standard error of 0.01) using QINTER4 or QBATCH4 range from 10 
seconds for Dunnett’s multiple comparisons procedure to 52 seconds for 
Tukey’s procedure, using a 486-33 processor. 

A problem that frequently arises in statistical analysis is to compute 
(6.27) when A is a rectangular region, that is, 


b pbz bp 
P = J / -f f (21,22, ...,£p)d£p' + drzdzı. (6.29) 
ay a2 ap 


Wang and Kennedy (1997) employed numerical interval analysis to com- 
pute P. The method is similar to the approaches of Corliss and Rall 
(1987) for univariate normal probabilities and Wang and Kennedy (1990) 
for bivariate normal probabilities. The basic idea is to apply the mul- 
tivariate Taylor expansion to the joint pdf f. Letting c; = (a; + b;)/2, 


the Taylor expansion of f at the mid point (ci, ¢2,...,¢p) is 
f (21, 22, Zp) 
-1 
2% > 1 OF f (c1,€2,...5€p) JI í; e" 
k=0 ( Jal kil- -kp! Oat Ont? -Oep j=l = 


Is. OU E nestled) pts. 
* maim aoa oat LI C cj) ; 


(6.30) 


where €; is contained in the integration region [a;,b;] and ]k[ denotes 
all possible partitions of k into p parts. For example, in the case p = 3, 
]2[ will result in 6 possible partitions of ‘2’ into {kı, k2, k3}: {0,0,2}, 
{0,1,1}, {0,2,0}, {1,0,1}, {1,1,0}, and {2,0,0}. The main problem 
with computing (6.30) is the presence of high-order partial derivatives 
of f. Defining 


1 kitkzt+kp f 


ky!ko! +++ kp! Ox" Ark? -Oxir 


(Fer ko---kp (6.31) 
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Wang and Kennedy derived the following recursive formula 


1 xTR-!x\ + 
(fkrko--ky = ee (=>) 


( =E) 
x ae . 
u kı—lı,k2—l2,...,kp—lp 
With regard to the last quadratic term, it should be noted that higher 


than second-order partial derivatives are all zero. To carry out the com- 
putation of (6.31) for a given (kı, k2,..-, kp), one can 


e first let one 1; be k; — 1 (if this k; # 1) and all the other /;’s be their 
corresponding k;’s; 

e next let l, and l, be kp — 1 and ks — 1, respectively (if k, # 1 and 
ks # 1), while all the other l;’s take their corresponding k;’s; 

e finally, let some lj be kj — 2 (if k; > 2) and all other l;’s be the 
corresponding k;’s. 


The total number of terms that contribute to computing (f)kıkz--kp IS at 
most p(p+3)/2. Compared to the multivariate normal distribution, this 
number is larger (Wang and Kennedy, 1990). The following table gives 
the running times and the accuracy for computing (6.29) with v = 10. 


Running time and accuracy for computing P in (6.29) 


p Ruming aj =—0.5 aj=-—0.4 a;=-0.3 a; = —0.2 
time (min) bj=0.5 bj=04 6)=03 0b; =0.2 


10 80 2 sig 4 sig 
9 70 3 sig 7 sig 
8 85 0 sig 5 sig 10 sig 
7 90 3 sig 8 sig 

6 110 3 sig 8 sig 

5 180 10 sig 


Another point to note about Wang and Kennedy’s method is that when 
the integration region is near the origin it works better for larger v, while 
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when the integration region is off the origin it works better for smaller 
Vy. 

The main problem with Wang and Kennedy’s (1997) method is that, 
the calculation times required are too large even for low accuracy results 
(see the table above). Genz and Bretz (1999) proposed a new method for 
computing (6.29) by transforming the p-variate integrand into a product, 
of univariate integrands. The method is similar to the one used by Genz 
(1992) for the multivariate normal integral. 

Letting MM’ be the Cholesky decomposition of R., define the follow- 
ing transformations 


p 
se Mj kYk, 
k=1 


v+ Y? 
EN peg 


U; = Tr45-1 (Z3), 
and 
Z; = 4; +W; (e;—-4;), 


where T, denotes the cdf of the univariate Student’s t distribution with 
degrees of freedom 7, 


d; = T,4j-1 (4), 


ej = Tr4j-1 (i) 
rn ees ae 
a= a; ; 

y+ yr 1 Vi 
: aay 
bj = v Senla 

v+) i= ive 


; Vee eee 
1 aj — dopa Mj,k Yk 
Mij 
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and 


Y bj — Dii Mp RYe 

3,5 
Applying the above transformations successively, Genz and Bretz re- 
duced (6.29) to 


P = (edi) f (e-d) f en-do) [aw (6.32) 
[ [of ioa (6.33) 


The transformation has the effect of flattening the surface of the original 
function, and P becomes an integral of f(w) = (e1 — di) +: (ep — dp) 
over the (p — 1)-dimensional unit hypercube. Hence, one has improved 
numerical tractability and (6.33) can be evaluated with different multidi- 
mensional numerical computation methods. Genz and Bretz considered 
three numerical algorithms for this: an acceptance-rejection sampling 
algorithm, a crude Monte Carlo algorithm, and a lattice rule algorithm. 


e Acceptance-rejection sampling algorithm: Generate p-dimensional uni- 
form random vectors w,,W2,...,Wy and estimate P by 


‘ pA 
= W ŽA (My,), 
l=1 
where 
h (x) = 1 if aj < £j < bj, j = 1,2,...,p, 
0 otherwise 
and 


— pal : vt+ Dw LV 
uj = Tyyy-1 (Wj) errors ig v+j—1l 


j=1,2,...,p, 1=1,2,...,N 


e A crude Monte Carlo algorithm: Generate (p—1)-dimensional uniform 
random vectors w1, W2,..., Wy and estimate P by 


1 N 
y2 fw), 
{=i 


an unbiased estimator of the integral (6.33). 
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e A lattice rule algorithm (Joe, 1990; Sloan and Joe, 1994): Gener- 
ate (p — 1)-dimensional uniform random vectors w1, W2,..., Wy and 
estimate P by 


Here N is the simulation size, usually very small, q corresponds to the 
fineness of the lattice, and z € RPT! denotes a strategically chosen 
lattice vector. Braces around vectors indicate that each component 
has to be replaced by its fractional part. One possible choice of z 
follows the good lattice points; see, for example, Sloan and Joe (1994). 


For all three algorithms — to control the simulated error - one may 
use the usual error estimate of the means. Perhaps the most intuitive 
one of the three is the acceptance-rejection method. However, Deak 
(1990) showed that, among various methods, it is the one with the worst 
efficiency. Genz and Bretz (2001) proposed the use of the lattice rule 
algorithm. Bretz et al. (2001) provided an application of this algorithm 
for multiple comparison procedures. 

The method of Genz and Bretz (1999) described above also includes 
an efficient evaluation of probabilities of the form 


b 
P = f a(x) f(x)dx, 


where g(x) is some nuisance function. Fortran and SAS-IML codes to 
implement the method for p < 100 are available from the Web sites with 
URLs 


http://www.bioinf .uni-hannover.de/~betz/ 
and 


and http://www.sci.wsu.edu/math/faculty/genz/homepage. 


6.11 Probabilities of Linear Inequalities 


Let X be a random variable characterizing the “load,” and let Y be a 
random variable determining the “strength” of a component. Then the 
probability that a system is “trouble-free” is Pr(Y > X). In a more 
complicated situation, the operation of the system may depend on a 
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linear combination of random vectors, say af X, + af X, + b, and the 
probability of a trouble-free operation will be 


Pr (aX; +a, X2+5>0), (6.34) 


where X; are independent k;-dimensional random vectors, a; are kj- 
dimensional constant vectors, and b is a scalar constant. Absusev and 
Kolegova (2001) studied the problem of constructing unbiased, maxi- 
mum likelihood, and Bayesian estimators of the probability (6.34) when 
X; is assumed to have the multivariate t distribution with mean vector 
uj and correlation matrix Rj. If x11,...,Xin, and X21,...,Xanz are iid 
samples from the two multivariate ¢ distributions, then — in the where 
case both p; and Rj are unknown — it was established that the unbiased 
and the maximum likelihood estimators are 


P(ni/2)T aa 
aD ((ny — ee ((n2 — 1)/2) 


xf TI +) hea a ? dindin 
Qı j= | 


Pr (af Xi + al X, +b> 0) = 


and 
Ts Ts 
j a a X b 
Pr (ai Xi +a? X: +b > 0) = © pa ac t0 
ai Sni4181 + a Sn.+142 


respectively, where 


Q = jo <niana 
2 
5 vjq/nja? Snj41aj + 5 al x; +b> o}, 
j=1 j=l 
E 1S 
Xn; = ao) 5 Xm, 
ny m=1 
ntl 
(nj+1)x; = 5 Xjm, 
m=l 
nj+1 
_ LAP 
(ni +1) Sajptt = J (jm — 3i) (Xjm — Ks) 
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and Xn,41 = x. A Bayesian estimator of (6.34) with unknown pa- 
rameters yt; and R; and the Lebesgue measure p(@)d@ = dudR was 
calculated to be 


2 ) 12) nk 
Prg (aj Xı + ay X2 +b > 0) = lee 
jaa TT ((nj — 1)/2) (nj + 1)” 


(nj — kj — 1)/2) 
SIE, = Dp) 


nj-3 


a Tl 1 — z?) ae dz,dz2, 
Q 


2 j=l 


2 2 
Qe = z? < l,j = 1, 2, Soz njal Snj41aj + X al x; +b>0 
j=l j=l 


This Bayesian estimator is biased and is related to the unbiased estima- 
tor via the relation 


Prg (afX;+a3X2+b>0) = APr(a7Xı +a X: +b> 0), 
where 
s nj — kj — 1)/2)T ((n; = k)/2) (nj +) 
jel T (nj — 2k; — 1)/2)T (n;/2) ni? 


The coefficient A can be expanded as 


k k 1 
7 = ts hs eee ener ee 
a Sk (=) 


where n = max(ni,n2) and k = max(ki,k2). Therefore, the Bayesian 
estimator is asymptotically unbiased as n —- oo. 

Substantial literature is now available on problems concerning proba- 
bilities of the form (6.34) for various distributions. For a comprehensive 
and up-to-date summary, the reader is referred to Kotz et al. (2003). 


6.12 Maximum Probability Content 


Let X be a bivariate random vector with the joint pdf of the form 


f(x) = g (œ= WTR x- u)), (6.35) 
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which, of course, includes the bivariate t pdf. Consider the class of 
rectangles 


Ria) = {(#1, 22): |z1| < a, |z2| < A/(4a)} 


with the area equal to À. Kunte and Rattihalli (1984) studied the prob- 
lem of characterizing the region R in this class for which the probability 
P(R(a)) = Pr(X € R(a)) is maximum. As noted in Rattihalli (1981), 
the characterizations of such regions is useful for obtaining Bayes re- 
gional estimators when (i) the decision space is the class of rectangular 
regions and (ii) the loss function is a linear combination of the area of 
the region and the indicator of the noncoverage of the region. It was 
shown that, for any fixed à > 0, the maximal set is 


{(21, £2) : |t1 — pn] < ¢,|22 — pal < A/(Ac)}, 
where c is given by 
à fr? 
IVa 
Here, r denotes the (i,7)th element of the inverse of R. In particular, 


if u = 0, r?? =r?! = p and | p |< 1 in (6.35), then P(R(a)) is increasing 
for a < VX/2 and is decreasing for a > VX/2. 


c 


6.13 Monte Carlo Evaluation 


Let X be a central p-variate t random vector with correlation matrix R, 
and degrees of freedom v. Vijverberg (1996) developed a family of simu- 
lators of the multivariate t probability p = Pr(X < Xo) based on Monte 
Carlo simulation and recursive importance sampling. We shall provide 
the basic steps of this rather complicated but powerful procedure. 

Define Z = AX, where A is an upper triangular matrix such that 
ATA = R. Then it is well known that the pdf of Z can be expressed as 
a product of univariate Student’s t pdfs 


fa) = [JA (0v), 


k=1 


where 
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and 


fi (2307, v) I ((v + 1)/2) | 1 a —(v+1)/2 | 


VroI (v/2) 
We shall denote by F\(x;07,v) the cdf corresponding to fı. For conve- 
nience denote A`? = B = bij, where B is an upper triangular matrix 
with bpp = 1 and bj; > 0 for all j. Then, since the integral over X covers 
the region X < Xo, the integral over Z is determined by the inequality 
BZ < Xo, and the bounds can be written as 


v o? 


Zp < £0 


= Zno 
and 
p 
Zk < ba (21 - D wai) 
i=k+1 
= Zko (Leo, Zk+1; -> -3 Zp) 


for k = 1,2,...,p— 1. Utilizing this transformation, the probability p 
can be written as p = Jp, where 


Yko 


Jk = / fi (243 0%, Ve) er dex (6.36) 


—oo 


ZkO 
1 F (2ko; OR, Vk) Jk- fi (zk; 0%, ve) dz, 


— 00 


Eşe [F (2ko; Ths Ve) Jk-1] , k=2,3,...,p, 


where 


fi (za3 02, Ve) 


A ee = Taa 
7: , 


is the univariate unconditional t pdf for z4 < zgo and Jı = F} (210; 02, 11). 
Hence, Jj, is the probability over the range of (z),..., zg) conditional on 
the values for (zg41,---; Zp). 

The Monte Carlo simulation starts off by drawing random values of 
zp from the distribution ff(-; zp0,03,%p), which we shall denote by Zp,r, 
r = 1,...,R. Each of these yields a different bound Zp-1,0,r and pa- 
rameter value 6?_, „ for each draw of zp-1; Zp—1,r is then drawn from 
the distribution ff(-; Z)-1,0,r,05-1,r.4p-1)- This process continues until 
Žə r is drawn and J = Fy (410,73 7,751) is computed with a commonly 
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available approximation routine for the univariate Student’s ¢ cdf. The 
simulated estimate of p is then found as the sample average of the Jp 
values across the simulated sample of R elements 


R 
: s 1 ” i s 
p= J= 3 yA (20363 Č, Vp) Spats 
r=1 
where J, = Fy (20,03 Cees Uy) Je for k = 2,...,p—1. It is more efficient 
to estimate J, by averaging over a large number of elements than to 
obtain close approximations of its components J; for k < p. Therefore, 
a better estimate for p is 


E (2 
p= ao {A Gantan) }. 
r=1 \k=1 


The right-hand side of (6.36) remains unchanged if the integrand is 
divided or multiplied by any nonzero function of z. Let gp be a p- 
dimensional pdf such that 

p 
gplziv) = |] a (arè), 
k=1 
where gı is a univariate pdf of a type to be mentioned below with 
Var(zk) = TÈ = o?vk/(Vk — 2), and o? and vg are as defined above. 


Let G1 (2k; T) be the associated cdf, and let 
gı (zr; T) 

© (243 Zk0, TA = 

Oi (Z; 240, Tk) Gi (2ko; TÈ) 


be the conditional pdf. Finally, let 


p 
CEON SA c A 2 
Gp (2; 20, V) = IL 9 (Zk; 20, Th) - 
k=1 
With these definitions, one can write p = Jp in terms of 
Zk0 f (z -o2 v ) 
1k: UR Yk 2 
JI = a ea (zk; 240, TR) dzk 
~oo Gf (Zk; Zk0, Th) 


fi (2k; 0}, Vk 
Ege | ı( k ) 


Gf (2k; 2ko, TP) 


sa|; k=2,...,p 


and Jı = F(z19;07,™). Clearly, Jp, and, more particularly, gj, is an 
important sampling density (see, for example, Hammersley and Hand- 
scomb, 1964). To evaluate p, the procedure is as follows: Generate ran- 
dom drawings Zp, for r = 1,...,R from the distribution g¢(-; zno, 72); 
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compute the implied values Zp_1,0,r and 72_,,, for each drawing of zp_1; 
draw Zp—1,r from the distribution gf(-;Zp—1,0,r,7—1,r)¥p); and continue 
on until 229, is drawn and Jı is computed. Based on this procedure, p 


may be written in the form 


ss ~2 
1 R 2 p fi (Zis erie 92... Ye) 
P = R x F (210,73 Tir) I ak a5 
r=1 k=1 g$ (Ze. ZkO,r> z) 
Three suitable choices for the importance density function are 


e the logit with 


À 
= ~¢(1—q), 

gı (z) 7a q) 

where 
q = [1+exp (—=Az/r)] 
and À = T/vV3; 
e transformed beta (2, 2) density (Vijverberg, 1995) with 

q(x) = 627(1—2)?, 

where 


exp (z/c) 
1+ exp (z/o) 


e the normal N(0, o°) density. 


Vijverberg (1997, 2000) has developed a new family of simulators that 
extends the above research on the simulation of high-order probabilities. 
For instance, Vijverberg (2000) has reported that the gain in precision 
using the new family translates into a 40% savings in computational 
time. 


7 
Probability Inequalities 


Probability inequalities on Pr(Y¥i < y1, Y2 < y2,--.,¥p < Yp) for multi- 
variate distributions have been a popular topic of investigation since the 
1950s. It is well known (Khatri, 1967; Scott, 1967; Šidák, 1965, 1967) 
that, for arbitrary positive numbers y1, y2,.-.,Yp, the inequality 


Pr ([¥i| < y1, [Yo] < y2,---s1¥pl < Yp) > TL Pr [Yel < ye) 


holds for any random vector YT = (Y1, Y2,..., Yp) having the multivari- 
ate normal distribution with zero means and an arbitrary correlation 
matrix. A question then arises as to whether there is an analog of this 
for multivariate t distributions. 


7.1 Dunnett and Sobel’s Probability Inequalities 
Dunnett and Sobel (1955) obtained bounds for the probability integral 


P = Pr(Xı < z1, X2 < T2,..., Xp < Lp) 


zı > 0, z2 > 0, ..., £p > 0, when (X1,X2,..., Xp) follows the central 
p-variate ¢ distribution with degrees of freedom v and the correlation 
matrix R. taking the special structure rj; = b;b; for all i # j. Using the 
definition that X can be represented as (2) /S, Z2/S,...,Zp/S), where 
Z is a p-variate normal random vector with correlation matrix R and 
vS?/o? is an independent chi-squared random variable with degrees of 
freedom v, one can rewrite P as 


is Z 22 Le 
P = Prf cane T2 T ry} 
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f G (£18, £28, . - - , pS) h(s)ds, (7.1) 
0 


where G is the joint cdf of Z and h is the pdf of S. If Yo, Yi, Yo,..., Yp are 
independent standard normal random variables, then one can represent 


Zj = 4/1 —b;Y; — bjYo for j = 1,2,...,p. Using this result, one can 


rewrite G as 


G(@18,%28,...,%)8) = Pr{ 1 — bY; — bj Yo saas 1, 2c2;9} 


Il | 
es) Co 
a & 
i i =~ 
S 2 
8 
S < & 
èn 5 
= DN & 
IIE T 
Sb a DS D 
R 
<—— ~v 
= 
x 
R 
= 
XY 


where ¢ and ® are, respectively, the pdf and the cdf of the standard 
normal distribution. Using the well known inequality 


Pitino} > []#{K(o)} (7.3) 
j=l j=l 


(where F} denotes a cdf), one can now bound G by 


G (£18, £28, ..- , Ep3) > H T g 
7 
Iie 


Vv 
id 


Substituting this result into (7.1) and applying (7.3) once more, one 
obtains the lower bound for P given by Dunnett and Sobel (1955) as 


A IG (x;s) f(s)ds 
I [E sitoa 


P 


IV 


\V 
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Pp 
II Pr {Z; < x58} 


j=l 


= JJe: {X; < z;}. (7.4) 


j=1 


This lower bound for P holds more generally for any correlation matrix 
R with ri; > 0 and any arbitrarily fixed (z1,..., £p). This is a conse- 
quence of the fact that P is an increasing function of each rj; for all 
i Æ j, while other correlations are held fixed. It can be shown further 
that 


PE (Xi > £1,..., Xp > Vp) Pr (X; > zj) 


V 
vam E 


1 


». 
Il 


and 


1s 


Pr(\Xi] Smy---5|Xpl >a) 2 [[Pr 0X > 25). 


1 


&. 
il 


Since the bound (7.4) does not depend on rij, it can be calculated eas- 
ily from a table of the cdf of the univariate Student’s ¢ distribution. 
Dunnett and Sobel (1955) also obtained two sharper bounds by slight 
modifications of the above arguments: For even p > 2, 


p/2 
P > I Pr {X25~1 < T2j—1, X25 < Tj}, (7.5) 
j=1 
and for odd p > 3, 
(p—1)/2 
P > Pr {X < x1} Il Pr {Xo; < Lj, X2j+1 < ©2541 } : 
j=l 


(7.6) 


In the case where rj; = p for all i # j and z; = z for all j, inequalities 
that are sharper than (7.5) and (7.6) can be obtained. Let 


Ai(p) = Pr(X, <d,X2 <d,...,Xp < d) 
and 
Bo(p) = Pr(|Xi| <d,|Xe2| <d,...,|Xp| < d). 


It is well known that (p), k = 1,2 are monotonically increasing in 
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Tij (i Æ j) and, if Tij > 0, then 
Pep) 2 i),  k=1,2 (7.7) 


Tong (1970) provided the following sharper bounds for f, 


Bu(p) > {Bx (m)}"!™ > BP(1) + {Be(2) — a2) PP”, (7.8) 


where p > m > 2. These inequalities certainly improve on (7.7), but 
neither of them are very sharp when p is large. Also observe that. the 
first inequality in (7.8) depends on p and m only through their ratio 
p/m. Hence, for fairly large p and m as long as p/m is close to 1 (even if 
the difference p — m is not small), the first inequality is quite adequate. 
If p = 0, then a necessary and sufficient condition for Bx(p) > BR (1) for 
every fixed p is that v — oo. 

Recently Seneta (1993) pointed out that the “sub-Markov” inequality 


Bip) > [Pr{X: <z, X < £} P /[Pr{X1 < rH? (7.9) 
is sharper than the corresponding inequality 
Bilp) > [Pr{X: < z, X2 < s}P” (7.10) 


as given by (7.8). This fact is illustrated in the following table, which is 
taken from Seneta (1993). 


Comparison of the bounds (7.10) and (7.9) for P. 
x chosen such that the true value of P = 0.95 


A Bound (7.10) Bound (7.9) 


10 2.34 0.945 0.946 
p=3 15 224 0.946 0.947 
20 2.19 0.945 0.946 
60 2.10 0.944 0.945 
10 2.81 0.921 0.921 
p=9 15 267 0.924 0.924 
20 2.60 0.926 0.927 


60 2.48 0.934 0.936 
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Actually, (7.9) is a particular case of the following inequalities 


Alp) > Pr(X;<2;j=1,...,m—-1) 
SAPP Xe < £| Xj <gj=1,... m= 1) 
(7.11) 


and 


x {Pr (Xml < 2 | [Xj] < zj = 1, m- PTT 


given by Glaz and Johnson (1984), who also provided a formal proof of 
the fact that (7.11) is sharper than (7.8). 

Dunnett (1989) wrote a Fortran programs for evaluating the integral 
(7.2). It uses Simpson’s rule to compute an approximation to (7.2) in 
such a way that a prescribed accuracy is achieved. To approximate the 
integral of a function a(z), say, over an interval [a,b] using Simpson’s 
rule, the value of the function is computed at the two end points and at 
the midpoint of the interval; then the approximate value of the integral 
is given by 


with its error bounded by a4 (a, 6)(b—a)* /2880, where a4(a, b) is a bound 
on the absolute value of the fourth derivative of a(z) over the interval 
(see, for example, page 66 in Shampine and Allen, 1973). The central 
processor unit time (on a VAX 8600 computer using single-precision 
arithmetic) taken to compute (7.2) ranges from 0.01 to 2.37 seconds for 
cases of equal correlation (ri; = p) and identical ranges of integration 
(x; = x). Slightly longer computing times are required for unequal corre- 
lations or different limits of integration. Dunnett (1989) suggested that 
his program can be used along with an appropriate numerical integra- 
tion routine, such as the Integral Mathematical and Statistical Libraries’ 
(1987, Volume 1, Chapter 4) QDAGIJ, to evaluate the multivariate t 
probability integral (7.1). 


7.2 Dunn’s Probability Inequalities 


The univariate Student’s t distribution has the property that the prob- 
ability evaluated from —z to +z is an increasing function of the degrees 
of freedom v — this also applies to the probability from —oo to +g (see, 
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for example, Ghosh, 1973, for details). Dunn (1965) pointed out that 
this monotonicity does not generalize to p dimensions in the usual mul- 
tivariate ¢ distribution. Specifically, let X have the central p-variate t 
distribution with degrees of freedom v and correlation matrix R. If F(z) 
is defined by 


F(z) = Pr{nfi.,-c< X<}, 


then F(z) equals the probability mass in the multivariate ¢ distribution 
evaluated over a p-dimensional hypercube centered at the origin of the 
half side z and F(z) is the distribution of the maximum of the absolute 
values of the p X variables. Similarly, if G(x) is defined by 


G(z) = Pr{n_ -0< X<}, 


then G(x) equals the probability mass evaluated from —oo to x in each 
direction and G(x) is the distribution of the maximum of the p X vari- 
ables. Dunn showed that, for any given x > 0 and degrees of freedom 
vı > v2, there exists an integer K such that, for all p > K, 


Fv, (z) < F, V2 (z) 
and 
Gp,v (x) < Gp». (z). 


Here, Fp,» and Gp,» are F and G as defined above, with dimension p and 
degrees of freedom v. This result covers the case of all correlations equal 
to 0. When all correlations are equal to 1, the distribution is the same as 
the univariate Student’s t distribution, so that, for all dimensions, F(z) 
and G(x) are monotonically increasing functions of v. Other correlation 
matrices may be considered in some sense to lie between these two ex- 
tremes. In various unpublished tables of F(x), the change is found to 
occur at a dimension where F(z) is approximately 0.25 or 0.30. 


7.3 Halperin’s Probability Inequalities 


Halperin (1967) extended the inequality (7.4) for generalized bivariate 
t distributions as follows. Let (Yi, Yie), ¢ = 0,1,2,...,r, r > 1 be 
independent samples from a bivariate normal distribution with zero 
means, variances oĉ, 0%, and covariances 002i, | pi |< 1. Let Yi, 
i=r+1,...,r +n and Yo,i=rtl,...,r+m be independent nor- 
mal samples with zero means and variances o? and o2, respectively, and 
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independent of (Yz, Yi2), i = 0,1,2,...,r. Define 


Yio Yoo 
Xı, X = —— 
( l; 2) (ž ’ So 
where 
1 rtn 
a y2 
Sı r+n 2 il 
and 
1 r+m 
= y2 
Se r+m 3 i2 


Halperin (1967) then showed that the probability integral of (X, X2) 
satisfies the inequality 


Pr (|X| < z1, |X2| < a2) > Pr(|Xi| < 21) Pr (|X2| < z2) 


for all real numbers zı and z2. 


7.4 Šidák’s Probability Inequalities 

In the bivariate case considered above, it is assumed that the correlation 
between Y; and Y;2 may be different for different i’s. For a general p, 
but for a special correlation structure of Y’s, Šidák (1967) established 
the following result. Let YT = (Y1, Yo,..., Yp) have a p-variate normal 
distribution with zero means and an arbitrary correlation matrix. Let 
ZT = (Za, Zi2,...,Zip), i = 1,...,n be a pvariate normal random 
sample, which is mutually independent and independent of Y, each of 
which has zero means, unit variances, and the decomposable correlation 
structure given by 


Corr (Zki, Zkj) = bib; 


for i,j =1,....p i # j; k =1,...,n withO <b; < 1, i =1,...,p. 
Then, 


Pr (athe Lti 


V Gi to + Za Zeit + Ze, 


> ee aes | eer : (7.12) 
= VA tee + Za 
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Essentially the same result, assuming more generally only | b; |< 1, 
follows by an easy specialization of Corollary 8 in Khatri (1967). 

A general proof of the inequality (7.12) under the assumption that 
Y and all Z;’s have the same normal distribution with zero means and 
an arbitrary covariance matrix was provided by Scott (1967). Unfortu- 
nately, this proof is correct only for p = 2, and Šidák (1971) produced a 
counterexample showing its incorrectness for p > 2. Siddk (1971) went 
on to show that, if 


Corr (Yi, Yj) = cieyri 


for i,j = 1,2,...,p; i AJ with | cz |< 1 (j =1,2,...,&) and {riz} any 
fixed correlation matrix, and if 


Corr (Zii, Ziz) = bubij 


for i,j = 1,2,..., p; i # j; l= 1,2,...,n with | bu |< 1 (i = 1,2,...,p, 
l = 1,2,...,n), then the left-hand side probability in (7.12) as a function 
of cj is nonincreasing for —1 < c; < 0 and nondecreasing for 0 < cj < 
1, so that it has a minimum for c; = 0 and, as a function of by, is 
nonincreasing for —1 < by < 0 and nondecreasing for 0 < by < 1, so 
that it has a minimum for bu = 0. Hence, (7.12) is also true for this 
more general correlation structure. 

Siddk (1973) obtained an inequality using exchangeability when X is 
a central p-variate t random vector with the equicorrelation structure 
rij = p, i #39. He showed that 


Pr(b < Xı <a,...,b < Xp <a) 


> {Pr(b< Xi <a,...,b< X, < a)” 
> {Pr(b< Xi; <a)}¥ 


for all p >r > 2 and a > b. In an earlier paper, Tong (1970) obtained 
similar results for a much larger class of random vectors. 


7.5 Tong’s Probability Inequalities 


We noted in Chapter 1 that the multivariate ¢ density is Schur-concave 
in the particular case rj; = p, i # j. Tong (1982) used this property to 
derive certain probability inequalities. He showed that if f : RP — [0, 00) 
is Borel-measureable and Schur-concave, then, provided that the inte- 
gral exists, f A(x) f (y)dy is also a Schur-concave function of (21,...,p), 


7.5 Tong’s Probability Inequalities 
where A(x) denotes the rectangular set 
A(x) = {yly €R?,|yj| < zij =1,--.,p}. 
Taking f to be the pdf of the multivariate t, it follows that 
Pr(|Xj|<2j,j=1,...,p) < Pr(|Xj;|<@j=1,...,p), 


where = — (£1 +: + Zp). 


173 


8 


Percentage Points 


From the 1950s numerous authors have tried to compute the percentage 
points of multivariate t distributions. It is an indication of the interest 
in problems leading to applications of this “new” distribution. This re- 
search continued well into the 1990s and is still going strong. Although 
some of the results have by now lost their practical importance — in our 
opinion — it is essential for historical reasons to describe a majority of 
these contributions. This will certainly assist historians and experts in 
multivariate distributions to gain a better perspective of the develop- 
ments in the area. Moreover, many of the techniques involved in these 
calculations are ingenious and worthy of emulation and further investi- 
gation. 


8.1 Dunnett and Sobel’s Percentage Points 


Let (X1, X2) have the bivariate t distribution with joint pdf (6.1). For 
given v, p and probability level y, let d denote the equicoordinate per- 
centage point satisfying 


d d 
f | f (x1, £2; v, p)dzıdz2 = 7 
-00 J -œ 


The value of d can be determined for any v by trial and error using 
(6.3) and (6.4). However, this procedure becomes more involved as v 
increases. Dunnett and Sobel (1954) derived an asymptotic expansion 
in powers of 1/v that expresses d in terms of the corresponding quantity 
e for the bivariate normal distribution: e is defined by 


Í f f (z1, £2; 00, p) dzıdz2 =ý 
=o J—00 
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and can be obtained by interpolation in the classical tables of Pearson 
(1931), for example. Their expansion yields a good approximation of 
d even for moderately small values of v. The method of derivation is 
essentially the same as that used by Fisher (1941) in deriving an asymp- 
totic expansion for the percentage points of the univariate Student’s t 
distribution. Up to the terms of O(1/v), Dunnett and Sobel obtained 


e t'(t 
d = e~—<e?4+1- 
e 5 fe + FO } 
and by inverting 


e = arifesi- SO}. 


They also tabulated numerical values of the coefficients of the terms 1/v* 
for i = 0, 1,2,3,4. 


8.2 Krishnaiah and Armitage’s Percentage Points 


Consider X having a central p-variate t distribution with degrees of free- 
dom v and with the equicorrelation structure rj; = p, i # j. Krishnaiah 
and Armitage (1966) evaluated the multivariate percentage point d by 
solving the integral 


d d 
f -f f(z,- -, £p; V, p)dz1 - -d£p = Y (8.1) 
—o —-oo 


and produced extensive tables for all combinations of p = 1(1)10; v = 
5(1)35; p = 0.05(0.05)0.9 and y = 0.90, 0.95, 0.975, 0.99 (see also Ar- 
mitage and Krishnaiah, 1965). These computations use the approxima- 
tion that 


C pyv—-l 


A z” exp (2°) f° 
ra saf rem Le 


(fae) 


where ® is the cdf of the standard normal distribution and the upper 
limit c is chosen large enough to make the error of approximation as 
small as desired. Krishnaiah and Armitage (1966) took c = 10. 
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8.3 Gupta et al.’s Percentage Points 
Gupta et al. (1985) solved equation (8.1) by setting 


|P(d)-y| < 107° 


with P(d) computed by the approximation (6.14). The numerical eval- 
uation of (6.14) involves the evaluation of the derivatives G®) (d) given 
by (6.15) for k = 0,1,...,8. But it is easily seen from (7.2) that 


ae a Vez +d 
(k) = —— p 
Gd) OR? fs ( ET ) Hy, (z)b(z)dz, (8.2) 
where H,(z) is the Hermite polynomial of degree k given by (6.11). By 


letting A = /p//1— p and B = 1//1— p and changing the variable 
by the transformation u = Az + Bd, (8.2) can be rewritten as 


amin = 3 (BY em (SE) 


Gupta et al. (1985) approximated this integral by 


9A+4B k E 7 
a = 3" (wom (Ra 


and the integration was carried out by Gauss’ method over intervals of 
length D = 0.5 starting from —9 until 9A + 4B was included. They 
provided tabulations of the percentage point d for all combinations of 


e p= 1(1)9(2)19; v = 15(1)20, 24, 30, 36, 48, 60, 120, 00; y = 0.75, 0.9, 
0.95, 0.99 and p = 0.1, 0.2(0.1)0.6; 

e p= 1(1)9(2)15; v = 15, 17, 20, 24, 36, 60, 120, œ; y = 0.9, 0.95 and 
p = 0.7(0.1)0.9. 


8.4 Rausch and Horn’s Percentage Points 


Rausch and Horn (1988) considered the particular case of (8.1) when 
the common p= 0. They used the approximation 


P(d) & wa? (=). 


The weights w; are calculated according to the formula 


_ T(n+ 8) l Mug 
= ranra mO 
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where 8 = v/2—-1, 


m I 
B zZ m + BY (=z) 
Ine) = Dy eee i! 
i=0 
are the Laguerre polynomials and 21,...,Zņ are the zeros of L£ (z). 


Rausch and Horn computed d for all combinations of 3 < p < 100; 
5< v < 120,v = œ; and 0.5 < y < 0.99. 


8.5 Hahn and Hendrickson’s Percentage Points 


Hahn and Hendrickson (1971) computed percentage points d by solving 
the equation 


d d 
P(d) = i f flen---s8p5mp)des de= (8.3) 


d 


for all combinations of p = 1(1)6, 8, 10, 12, 15, 20; v = 3(1)12, 15, 20, 
25, 30, 40, 60; p = 0, 0.2, 0.4, 0.5; and y = 0.90, 0.95, 0.99. As one 
would expect, these values are comparable to the positive square root of 
the values given by Krishnaiah and Armitage (1966, Section 8.2). Hahn 
and Hendrickson’s computations use the approximation that 


ra = CU 
-8 (Se) | sna] h(z)dz, 


where ¢ and © are, respectively, the pdf and the cdf of the standard 
normal distribution and A is the pdf of ./x2/v. 


8.6 Siotani’s Percentage Points 


Siotani (1964) suggested two interesting approximations for computing 
d in (8.3). The first approximation is the value dı satisfying 


pPr(X?>dj) = 1-7; 


this approximation had been suggested previously by Dunn (1958, 1961). 
By Bonferroni’s inequalities, one notes that 


lay ene) <1- Pd) Si 4; 
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where 


ap = > Pr(X? > di, X? > d?). 


i<j 
Thus, if €1(7,p) is sufficiently small, then one can use dı as a good 
estimate. A modified second approximation is the value dz satisfying 


2pPr (X? >) = 1-y+ea1(7,p). 
This time, one notes that 


—€2 (y, p) < Y — P (d2) < 63 (7, p), 


where 
ely p) = J Pr(X? > di, X? > d) -a (7p) 
i<j 
and 
amh = >. Pr(X? > d3,X? > d3, x? > d3). 
i<j<k 


Since both €2(7,p) > 0 and €3(7, p) > 0, the absolute value of y — P(d2) 
may be expected to be sufficiently small for the tail of the p-variate t 
distribution to correspond to 1—7 for values of y > 0.95. For the partic- 
ular case p = 2, 4 = u2 = 0, and the equicorrelation structure fij = p, 
i # j, Siotani (1964) tabulated estimates of the probability in (8.3) for all 
combinations of d = 2.0(0.5)4.5; v = 10(2)50(5)90, 100, 120, 150, 200, co; 
and | p |= 0.0(0.1)0.9, 0.95. He also illustrated applications to interval 
estimation of the parameters in the model of a randomized block design 
and for coefficients in a normal regression equation. 


8.7 Graybill and Bowden’s Percentage Points 


Graybill and Bowden (1967) derived bounds for d satisfying (8.3) for the 
special case p = 2 and p = 0. In this special case (8.3) becomes 


Pr{Xi<@,X7<@} = 7 
or, equivalently, 
Pr {max (X?,X2) <@} = 7. 
But 
Pr {max (X?, X2) <P} = Pr{X? +X? < 2Fy2,-2} 
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and 
Pr {max (X?, X2) < Fy2v-2} < Pr{X? +X? < 2F,2»-2} 
< Pr{max (X?, X2) < 2F,2,-2}, 


where F, 2-2 is percentage point of the F distribution with degrees of 
freedom 2 and v — 2 corresponding to y. Hence, one obtains 


Fy 2,v-2 < a < QF 2-25 


the bounds given by Graybill and Bowden. In a related development, 
McCann and Edwards (1996) obtained the following lower bound for the 
left-hand side of (8.3) when the underlying correlation matrix R is of 
rank r 


P(d) > 1- in f tog PEEN 


-2 _ 
+Fe-1,1 {e+} ) r q(s)ds (8.4) 
r-l1 
with 
p-l 
A = eS arccos (T; 541) , (8.5) 
k=1 


where Fm,n is the cdf of an F distribution with degrees of freedom m 
and n, and q denotes the pdf of \/F,,,/r. This inequality requires only 
the evaluation of a one-dimensional integral and depends on R. through 
its rank r and also through the constant A. If one writes R = AAT for 
ap xr matrix A of rank r with rows aj, then it is interesting to note 
that the terms arccos(r;,;1) in (8.5) are the angles between consecu- 
tive a, vectors, which are points on an r-dimensional sphere. It is also 
straightforward to show that the d that sets the right-hand side of (8.4) 
equal to y is strictly increasing in A. This implies that, as A — oo, one 
has d > ,/rF\_y,r,, which is the percentage point given by Scheffé’s 
method. On the other hand, as A — 0, one has d > t(_¥)/2,v, the per- 
centage point of the univariate Student’s t distribution corresponding to 
(1 — y)/2. This is intuitively pleasing because A — 0 implies that cor- 
relations in R approach 1, in which case the p-dimensional distribution 
becomes one-dimensional for all practical purposes. 


180 Percentage Points 


8.8 Pillai and Ramachandran’s Percentage Points 


Pillai and Ramachandran (1954) tabulated solutions of (8.1) and (8.3) 
for 


e p = 1(1)8; v = 3(1)10, 12, 14(1)16(2)20, 24, 30, 40, 60, 120, oo; 
y = 0.05 and p = 0; 
e p= 1(1)8; v = 5(5)20, 24, 30, 40, 60, 120, œ; y = 0.05 and p = 0, 


respectively. For computing these percentage points, Pillai and Ra- 
machandran used the pdfs of 


Up = max(X},X2,...,Xp) 


and |U,|, which were derived as 


pwy = 
f(u) = Tea” s aE 


xT (== *) ap! 


and 
Ff (up) = D S p+2k-1 ye (pt2k-+v) /2 
j T(v/2)r”/2 = p re Duk a 
p+2k+v ae 
xT (=) opt 


respectively, where the a’s and b’s are the coefficients of the expansions 


2 k 
(vr Lo aè] 
= exp (FE) fag? tally + ay? T | 
and 


v ? à ky? f i 
(eCa = stow (ME) peaa], 
0 


respectively. Note that ak") = (1/2)* and pt) =1. 


8.9 Dunnett’s Percentage Points 


Dunnett (1955) tabulated solutions of (8.1) and (8.3) for p = 1(1)9; 
v = 5(1)20, 24, 30, 40, 60, 120, oo; y = 0.01, 0.05; and p = 0.5. For 
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solving (8.1), Dunnett evaluated the integral in (7.1) by using tables 
of the multivariate normal cdf computed by the National Bureau of 
Standards. For (8.3), Dunnett bounded P(d) by 


P(d) > [Pr(-d< X, <d,-d< X < dP” 


(Dunnett and Sobel, 1955) and evaluated the probability integral of the 
bivariate t distribution using expressions (6.3)-(6.4). In a latter paper, 
Dunnett (1964) obtained approximations for d in (8.3) for all p’s lying 
between 0 and 0.5. 


8.10 Gupta and Sobel’s Percentage Points 


Gupta and Sobel (1957) solved (8.1) for the special case p = 1/2. Note 
from (6.8) that 


P(d) = Pr(Z< v2d) 


and that Z = (Mp — Y)/S is asymptotically normal as both v and 
p tend to infinity. This allows for the use of a technique developed by 
Cornish and Fisher (1950) for computing the percentage points d directly 
without first computing a table of probability integral values. Applying 
their result, Gupta and Sobel arrived at 


d = yyt a3l, +aglg + asle + asl. + a34 Ieda + of, tees, 


where yy is the percentage point of the standard normal distribution 
corresponding to y, a4 is the standardized cumulant defined in (6.12), 
and Ie, Ig, I,2, ... are tabulated in Table I of Cornish and Fisher (1950) 
for the probability levels y = 0.75, 0.90, 0.95, 0.975, 0.99, 0.995, 0.9975, 
0.999 and 0.9995. Gupta and Sobel tabulated d for all combinations of 
p+1=2, 5, 10(1)16, 18, 20(5)40, 50; v = 15(1)20, 24, 30, 36, 40, 48, 
60, 80, 100, 120, 360, œo; and y = 0.75, 0.9, 0.95, 0.975, 0.99. 

Gupta and Sobel (1957) also obtained several bounds for the percent- 
age point d satisfying (8.3). An upper bound for d is obtained by setting 
(6.8) to be equal to (1+ y)/2 while a lower bound is obtained by setting 
(6.8) to be equal to y. These bounds are best for large y’s. For smaller 
values of y, Gupta and Sobel provided the following lower bound 


1/(2p 
NSTC uaa 
{P (1+ )} 
where Xoy is the percentage point of the chi-squared distribution (with 
p degrees of freedom) corresponding to y. 
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8.11 Chen’s Percentage Points 


Chen (1979) provided an alternative formulation of percentage points d 
by solving the equation 


F(d,...,d;p,v) — F(—d,...,—d;p,v) Te. (8.6) 


for the special case p = 0, where 
x x 
F(a,...,2;p,v) = / ff f(a1,..-,2p3¥, p)dty--- day 
—0o -œ 


and f is the joint pdf of a central p-variate t distribution with degrees of 
freedom v and the equicorrelation structure rj; = p, 1 # j. Chen noted 
that (8.6) can be rewritten as 


oo 
f a- (ay) av) = 7 (8.7) 
where ©(-) is the cdf of the standard normal distribution and g denotes 


the pdf of \/x2; further, by a change of variable, z = /v/2y, (8.7) 
becomes 


= V2dz —V2dz \ | 2”— exp (—2?) z 
A a n 
Using the fact that the tail integral 
œ f e [ V2dz\ _ pe [ —v2dz\ | 2”~* exp (-2°) j 
a {* (5) i ( vo J} a 


[oe] gi exp (-z?) 
af Te A 


= 
<< 108 


for all v < 60, Chen approximated the left-hand side of (8.8) by 


10 v—] 2 
2 —v2d = 
Pld) = J ak Vidz i Vidz 2 m 
0 vv vv (v/2) 
and found the value dy such that P(do) = y. Tables of do were given for 


all combinations of y = 0.8, 0.9, 0.95, 0.99; p = 2(1)20; v = 2(1)30(5)60; 
and p = 0. 
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8.12 Bowden and Graybill’s Percentage Points 


Bowden and Graybill (1966) presented percentage points of the bivariate 
t distribution when the percentage points are not necessarily equal, that 
is, the case where 


d 
P(d,g) = ei f (z1, £2) drpdx, = y. 
-9 


Setting D = d and A = g/d, one can rewrite 


D AD 
Pda) = ff fem) deadss, 
-D J-AD 

which can be solved for D using the expressions (6.3) and (6.4). Bowden 
and Graybill computed D for all combinations of p — 2 = 4 (2) 16 (4) 24 
(6) 30 (10) 50; A = 0.5(0.1)1.5; | p |= 0.0(0.1)0.1(0.2)0.9; and y = 0.90, 
0.95. Trout and Chow (1972) extended this development for trivariate t 
distributions: Setting 


d g h 
f / f f (£1, £2, £3) dzzdz2dzr) = y 
—d J -g J -h 


and using the transformations D = d, A = g/d and B = h/d, the 
following expression is obtained 


D ,AD pBD 
f f f f (£1, £2, £3) drzdzgdx, = Ņ. 
-D J-AD J-BD 


Tables for D were given for all combinations of v = 5(1)9(2)29; A = 
0.5(0.1)1.5; B= 0.5(0.5)1.5; Tii = 0.1(0.4)0.9; M12 = 713 = T23 = 0; and 
y = 0.05. 


8.13 Dunnett and Tamhane’s Percentage Points 


More recently, Dunnett and Tamhane (1992) extended Bowden and 
Graybill’s (1966) calculations for multivariate t distributions of any di- 
mension with zero means and the equicorrelation structure rj; = p, 
i # j. Consider iid standard normal random variables Z;, 7 = 0,1,...,p 
and let S be a «/x2/v random variable independent of the Zj. Then 
the random variables defined by 

V1= pZ; — /pZo 

S 


XxX; 


j ; 1<j<p 


184 Percentage Points 


have the desired multivariate ¢ distribution. Thus 


P = Pr(X,<dh,...,Xp S$) 
[ 1. Pr(Zy < IE <p) b(2) h(s)dzeds, (8.9) 


where ej = (djs + ,/pz)/V/1— p, ¢ is the pdf of the standard normal 

distribution and h is the pdf of S. Dunnett and Tamhane (1992) ob- 

tained the following recursive formula for evaluating ®,..., = Pr(Zı < 
., Zp < €p) in the integrand of (8.9) 


®1...5 = Pr(Z, < €2, Z2 < €3,.. ., Zp-1 < ep) Ẹ (e1) 
+Pr(Z, < e1, Z2 < €3,..., Zp—1 < €p) 
x {® (e2) — & (e1)} 


+Pr(Zı < €1, Z2 < 2,.. ., Zp-1 < €n-1) 
x {@ (ep) — È (ep-1)}, (8.10) 


where ®(-) denotes the cdf of the standard normal distribution. They 
also suggested the following algorithm for computing ®)...p 


Step 1: Calculate ®; = G(e;), for j = 1,...,p, a total of p terms. 
Step 2: Calculate jk = 6; 6, + &;(, — ©), forl<j<k<p, 
a total of (2) terms. 

Step 3: Calculate ® 5x1 = kdj + Palk - &;) + © 54. (D — ,), 
forl<i<j<k<p,a total of ($) terms. 


Step p: Calculate ®12...p = ®2...p®1 + O13...p(@2 — 1) +-+ 
51...p-1(®p — Bp-1). 


The computational details of this algorithm can be found in Dunnett 
and Tamhane (1990). Kwong and Liu (2000) — using Kwong’s (2001b) 
lemma — proposed the following modification of (8.10) 


m 


m (5) Cna Jn (810 


where m > 2, J; = (e1), and Jo = 1 (see also Kwong, 2001a). 

There are three commonly known approaches for determining the per- 
centage points dı, .. ., dp in (8.9) (after setting P = y): the step-up pro- 
cedure, the step-down procedure, and the simulation approach (Dunnett 
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and Tamhane, 1995). We shall describe them below by means of recur- 
sive algorithms. Throughout we shall let Xj) < -++ < X(p) denote the 
order statistics of X,,..., Xp and let c),...,cp denote the corresponding 
ordering of di,...,dp. 


e Step-up procedure 
(i) Take cı to be the 1007 percentage point of the univariate Stu- 
dent’s ¢ distribution with degrees of freedom v. 
(ii) Solve the equation 
Pr (Xo) < c1, X(2) < c2) 
Pr(X, < c1, Xo < C2) +Pr(c, <Xı< C2, X2 < c) 


a 
for c> by evaluating the two bivariate probabilities using (8.9) 
and the value for cı defined in (i). 
(iii) Solve the equation 
Pr (Xa) < £1, X{2) < co, X(3) < c3) 
Pr (X1 < c1, X2 < Co, X3 < c3) 
+Pr(Xı < ¢c1,c2 < X2 < c3, X3 < c2) 
+Pr(c < X; < co, X2 < c3, X3 < c3) 
+Pr(cı < Xi < c2,¢1 < Xo < c3, X3 < c1) 
( 
( 


+Pr(eg < Xı < c3, X2 <4, X3 < c2) 
+Pr(c2 < Xi < ¢3,¢1 < X2 < c2, X3 < c1) 
= y 
for c3 by evaluating the six trivariate probabilities using (8.9) 
and the values for cı and cz defined in (i) and (ii), respectively. 
(iv) In general, the recursive formula below defines how the region 
over which the probability must be evaluated can be subdivided 
to obtain probability expressions 
[Xo) [L Cl; ,Xíp) < cp] 
= {X < C1, [Xio < C2,. -X (p) < cp) } 
+ {ce < Xi <e, [Xo < c1, X(3) <3,---,X(p) < cp] } 


+ {ep-1 < X1 < cp, [Xo < c1,--- Xip) < cpa], 
(8.12) 


186 Percentage Points 


where X(2) < -++ < Xip) denote the order statistics of Xo, ..., 
Xp with X, separated out. Formula (8.12) is applied recur- 
sively to the terms enclosed within the square brackets. This 
leads to a division of the region into p! subregions that have 
rectangular boundaries, making it possible to evaluate the in- 
dividual probabilities (using (8.9)). 


e Step-down procedure: See Dunnett and Tamhane (1991) for a lucid 
description. 


e Simulation approach: This approach is feasible provided that p is not 
so large that sampling errors in the values of c3,...,¢p-1 accumulate 
and render the estimated value of cp too uncertain to be of practical 
use. The procedure for estimating Cm given the values of ¢,...,Cm-1 
(Edwards and Berry, 1987) is as follows. 


(i) Let Nr denote the total number of simulations to be performed 
and choose Nr so that No = (1 — y)(1 + Nr) is an integer. 


(ii) Initialize a counter, Ne = No. 


(iii) For each simulation, draw m standard normal deviates Z1, ..., 
Zm having the desired correlation structure and, if v is finite, 
a random x2/v variate S?. 


(iv) Set X; = Z;/S if v is finite or X; = Z; if v = œ and order the 
X values to obtain the order statistics Xa) < `+- < Xim). 


(v) Check whether Xq) < ¢1,-.-,X(m—1) < Cm-1- If this is the 
case, store the value of X(m) and return to step (iii). Otherwise, 
decrease Ne by 1 and return to step (iii). 


(vi 


~~ 


After completing the Nr simulations, find the estimate of cm 
by counting down Ne from the top of the ordered values of the 
stored X(m). Note that this approach is general and does not 
impose restrictions on the correlation structure rj;. 


Dunnett and Tamhane (1992) computed values of di,...,dp) using the 
step-up procedure for all combinations of p = 2(1)8; v = 10, 20,30, 00; 
p = 0,0.1(0.2)0.5; and y = 0.95. Kwong and Liu (2000), using the 
same procedure but with the modification (8.11), computed values of 
d,,...,dp for all combinations of p = 9(1)20; v = 10, 20, 30, 00; p = 0.1, 
0.3, 0.5; and y = 0.95. 
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8.14 Kwong and Liu’s Percentage Points 


In the case rj; = b:b;, Kwong and Liu (2000) pointed out that (8.9) can 
be generalized to 


P= 3 D Hy...p$ (z) h(s)dzds, 


where Hj...) = Pr( Xi < €1,..., Xp < €p), ej = djx and X; are iid nor- 
mal random variables with means b;z and variances 1 — bi. The recursive 
formula, (8.10) and the algorithm for computing it can be generalized in a 
natural manner. Hence the step-up, step-down, or the simulation-based 
procedure can be used to compute the percentage points d,,...,dp. A 
fourth procedure not discussed above is one based on approximation. 


(i) c and c2 can be determined as in the step-up procedure. 

(ii) To determine c3, replace r12, 713, and r23 by p3 = (ri2 +1713 + 
r23)/3. Taking this as the common p and using the previous 
values c1, C2, apply the step-up procedure to estimate c3. 

(iii) To determine c4, replace r12, 713, T14, T23, T24, and r34 by p4 = 
(rig + T13 + T14 + T23 +724 +134)/6. Taking this as the common 
p and using the previous values c1, c2, ¢3, apply the step-up 
procedure to estimate cq. 

(iv) In general, replace the (7) correlation coefficients by their arith- 
metic average pm and use the previously calculated values of 
C1,---;Cm—1 to obtain an estimate for €m using the step-up pro- 
cedure. 


This procedure is similar to the ones presented in Dunnett (1985), Hochberg 
and Tamhane (1987, page 146), and Iyengar (1988). 


8.15 Other Results 


Some other tabulations of percentage points of the multivariate ¢ distri- 
butions with the equicorrelation structure are contained in the following 
references. 


e Paulson (1952) for p = 3,6 and p = 0. 

e Dunnett and Sobel (1955) used the lower bounds (7.4), (7.5), (7.6), 
and (7.10) for p = 3, 9; v = 5,00; p = 1/2; and y = 0.50, 0.75, 0.95, 
0.99. 

e Halperin et al. (1955) for p = 3(1)10, 15, 20, 30, 40, 60; v = 3(1)10, 
15, 20, 30, 40, 60, 120, 00; p = 1 — 1/p; and y = 0.95, 0.99. 
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Gupta (1963) for p = 1(1)50; v = œ; p = 1/2; and y = 0.75, 0.9, 0.95, 
0.975. 

Milton (1963) for extensions of Gupta’s (1963) tables for y ranging 
from 0.5 to 0.9999. 

Steffens (1969b) for p = 2. 

Dunn and Massey (1965) for p = 2, 6, 10, 20; v = 4, 10, 30, co; 
p = 0.0(0.1)1.0; and y = 0.5(0.1)0.9, 0.95, 0.975, 0.99. 

Tong (1970) for a procedure to calculate conservative estimates of the 
percentage points for p > 20 using tabulated values for p = 20. 
Freeman and Kuzmack (1972) for p = 6, 8, 10(5)30; No = 10, 20, 
mean (40-70), 50, 100, mean (90-500), 500; p = 0; and y = 0.90, 
0.95, 0.99, where v = p(No — 1). 

Gupta et al. (1973) for p = 1(1)10(2)50; v = œœ; p = 0.1, 0.125, 0.2, 
1/3, 0.375, 0.4, 1/2, 0.64, 0.625, 2/3, 0.7, 0.75, 0.8, 0.875, 0.9; and 
y = 0.75, 0.9, 0.95, 0.975, 0.99. 

Amos (1978) for p = 100; v = 100; p = 0.01; and y = 0.05, 0.1, ..., 
0.95. 

Ahner and Passing (1983) for 1 < p < 20; 2 < v < 120, v = œ; p=0; 
and y = 0.95, 0.99. 

Bechhofer and Dunnett (1988) for the most comprehensive table to 
date for p = 2(1)16, 18, 20; v = 2(1)30(5)50, 60(20)120, 200, o0; 
p = 0.0(0.1)0.9, 1/(1 + vp); and y = 0.80, 0.90, 0.95, 0.99. 

Kwong and Iglewicz (1996) for p = 4, 5; v = p(1)20, 24, 30(10)100, 
120, œ; p = —1/(p — 1); and y = 0.90, 0.95, 0.99 (note that the 
correlation matrix is singular). 


There has been relatively little work concerned with the percentage 
points of multivariate ¢ distributions when the correlations are not equicor- 
related. Apart from those mentioned above, four results known to us 
are 


e In the case that the (i,7)th element of the inverse of R is 


-2/(p+1), ift #3, 
Goldberg and Levine (1946) computed the percentage point d satis- 
fying 


yi = eee ift = J, 


d d ` 
T f f (21,..- 2p; v)dzı -drp = Y (8.13) 
=% =% 


for combinations of p = 3; v = 1(1)30(3)60(15)120, 150, 300, 600, co; 
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and y = 0.50, 0.75, 0.90, 0.95. This seems to be the earliest paper on 
this topic. 
In the case 


e ar 2min(i,j) f1- PZG), 


Freeman et al. (1967) computed the percentage point d satisfying 
(8.13) for combinations of p = 3(1)5; (v/p) + 1 = 10(10)100, 200, 
500; and y = 0.95. See also Bechhofer et al. (1954) and Table 4 in 
Dunnett and Sobel (1954). 
e In the case that the (7,7)th element of R is 
NN; 
ij (no + ni) (no + n4)’ 
0 i 0 j 
where nz denotes some treatment sample size, Dutt et al. (1976) 
computed d in (8.13) for all combinations of y = 0.95, 0.99 and 
3 < ni < 12, i = 0, 1, 2, 3 with p = 3 and the degrees of freedom 
v = } (n; — 1). See also Dutt et al. (1975). 
In the case ri; = 0 for all 1 A j except that 


Tiitl = oS a 

$ VG + Niyi yiti + Nite 
for some treatment sample sizes ng, Lee and Spurrier (1995) provided 
tables of one-sided and two-sided percentage points — of the form 
(8.13) - for 3 < p < 6 and ng = n (the balanced case). Liu et 
al. (2000) extended these tables for 3 < p < 10; v = 5(1)8(2)20, 25, 
30, 40, 60, 120, 00; and y = 0.90, 0.95, 0.99. See also Somerville et 
al. (2001). 


Calculations of percentage points for singular correlation structures 
of the form rj; = —b;b; are discussed in Spurrier and Isham (1985) and 
Kwong and Iglewicz (1996). The former provided tabulations of the 
percentage points for p = 3, 3 < nı < n2 < ng, 10 < N < 29, and 
~ = 0.90,0.95,0.99 when by = /ng/(N — npk) with N = ni +n + ng 
(where ną denotes some treatment sample size). 

Calculations of percentage points for correlation structures more gen- 
eral than the decomposable structure rj; = b;bj are quite difficult and 
challenging. Even the generalization to quasi-decomposable structures 
of the form rj; = bib; + Yi; (Yang and Zhang, 1997), where the 7;;’s are 
nonzero deviations for some i and J, is a rather restrictive assumption. 
A solution is to find the “closest” R for a given matrix R, which still 
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possesses the decomposable correlation structure (Hsu, 1992; Hsu and 
Nelson, 1998). One also has the choice of adopting the simulation ap- 
proach or one of the general approaches due to Somerville (1997, 1998b) 
and Genz and Bretz (1999) — described in Section 6.10. 


9 
Sampling Distributions 


Here, we shall consider sampling distributions of certain statistics asso- 
ciated with multivariate ¢ distributions. 


9.1 Wishart Matrix 


Suppose X,,...,X»n is a random sample from a p-variate t distribution 
with the common pdf 
T ((v + p)/2 
fou) = —TE+p) 


(rv)P/?T (v/2) [RI 
1 T -(v+p)/2 
x JL + > i-u) Ro (x; ~ p) 


The joint pdf of the n independent observations is given by 


F (xi,---,Xn) = f (xi)---f (xn). (9.1) 


However, it is more instructive to consider dependent but uncorrelated 
t distributions. Joarder and Ahmed (1996) suggested the model 


T ((v + p)/2) 
(nv)?”? T (v/2) RI"? 

ic —(vt+np) /2 
x JL += 90 Gi- u)" RO (xi — y) ; 


i=l 


f(xi,---)Xn) = 


(9.2) 


which they referred to as the multivariate t model. Joarder and Ali 
(1997) remarked that this model can also be written as a scale mixture 
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of multivariate normal distributions given by 


f (X1,---,Xn) 
© ([-2R —n/2 n = 
= f or exp f- $ (xi = u)” (PR) (xi - w} h (7) dr, 


i=] 
where 7 has the inverted gamma distribution with the pdf 


h (7) 


a es) (9.3) 
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for r > 0. Equivalently, X; | 7 has the multivariate normal distribution 
N,(u,7?R). Among others, Zellner (1976) and Sutradhar and Ali (1986) 
considered (9.2) in the context of stock market problems. By successive 
integration, one can show that the marginal distribution of X; in the 
multivariate ¢ model (9.2) is p-variate t. It also follows from (9.2) that 
E(X; — w)(Xi — p) = 0 for j #1. Thus, in (9.2), although X),..., Xp 
are pairwise uncorrelated, they are not necessarily independent. More 
specifically, X1,...,X,» in (9.2) are not independent if v < oo, since 
independence would imply that X,,...,X, are normally distributed. 
The case of independent normally distributed random vectors can be 
included in (9.2) by letting v — oo. In the case v = 1, (9.2) is the 
multivariate Cauchy distribution for which neither the mean nor the 
variance exists. Kelejian and Prucha (1985) proved that (9.2) is better 
able to capture heavy-tailed behavior than an independent ¢ model given 
by (9.1). 

The sampling quantities of interest are the mean vector and the sum 
of product matrix (Wishart matrix) given by 


and 


A = 5» (Xi -X) (xX; — xX)’, (9.4) 


i=1 


respectively. Sutradhar and Ali (1989) derived the corresponding pdfs, 
which are 
vv/?T ((v + p)/2) -1/2 
x —_.—-——_ |R 
A waro R/nl 


x E + (x - u)? R (ž- n)| -(v+p)/2 
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and 
T ((v + p)/2) —(n-1)/2) 4 |-(n—p—2)/2 
A) = —?—_ R A 
x [v + tR 1A] CPO? | (9.5) 


respectively, where A > 0, n > 1+p and F, (z) is the generalized gamma 
function defined by 


»o-)/4 f] r [2i] 
D(z) = r Jr (===) . (9.6) 
i=1 

The distribution of the Wishart matrix, (9.5), has its applications in fac- 
tor analysis. More specifically, in practice, one may be confronted with 
the situation where the observed data have a symmetrical distribution 
with tails that are fatter than that predicted by the normal distribution. 
In such cases, one could explicitly account for the observed “fat tails” by 
using the multivariate t model (9.2). Consequently, in factor analysis, 
analogous to the Wishart distribution, one may use the distribution of 
the sum of products matrix under (9.2). 


It is easily checked that, as v — oo, the pdf (9.5) converges to 
RI Y/2 | qj P-2)/2 
rara BE 

2p I, ((n — 1)/2) 


which is the pdf of the usual Wishart distribution W,(R,n). The pdf 
(9.5) can also be written as the mixture of distributions 


exp |- sR al , 


f (A) B A [eR ev? 
= h PD, 0-2 


x exp -5 ( (R) A)| f(r)dr, 


JAJ eae 


where 7 has the inverted gamma pdf (9.3). This is equivalent to saying 
that A | 7 = 7?W has the usual Wishart distribution W,(r?R, n). 
Joarder and Ali (1992) and Joarder (1998) derived various expectations 

of the Wishart matrix A. Specifically, one has the following expressions 
n—-1 

1—2/ pe 


E(A) = 


na _ n- 1)?R? + (n — 1) {RtrR + R? } 
EARS a O 


v >k, 
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k\ _ U(v/2—kp)T, ((n —1)/2+ k) ; 
B(\al) = v-*P(v/2) T(n- 1)/2) IRI", 
y>k, 
((n —1)/2+k)E (v/2- kp — 1) 
E (ialf A) v0 (v/2) 


Tp ((n— 1)/2 +k) 
Tp ((n — 1)/2) 
n+2k>1, v>2(kp+1), 


IR|‘R, 


20 (v/2 —kp+1)T, ((n — 1)/2 + k) 


a (al A”) = vi-kp(n + 2k — p)T (v/2)T, ((n — 1)/2) IRR, 
n+2k>p+2, v>2(kp—1), 
E [ra)?] CEEE Ç — 1) (trR)? + 2tr (R?)] 
vy>4 
and 
E [tr (A2)] TESTO IED [ntr (R?) + (trR)?], 
v> 4, 


where k is any real number and v > 0. These expectations are impor- 
tant tools in developing estimation theories for the correlation matrix, 
inverted correlation matrix, trace of the correlation matrix, and other 
characteristics of the correlation matrix, of the multivariate £ model un- 
der quadratic loss functions. Extensions of these expectations to the 
class of scale mixtures of normal distributions — which may be useful in 
inferential works having a ¢ distribution or the scale mixture of normal 
distributions as the parent population — are discussed below. 

Sutradhar and Ali (1989) derived an elementwise expression for the 
variance-covariance matrix of A. Letting m,; denote the (7, 7)th element. 
of R!/2, they showed that 


v-2 P 
Cov (Aij, An) = 70-1) YO MiuMjuMkuMu 
u=1 
p 
+(n — 1)(n — 2) 5. MiuMjuMku Mu 
u=1 


+(n-— 1) > MiuMjuMevMiy 


uu 
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2 
+(n- 1) >S MivMjyMkuMiu 
uu 


+(n = 1) X MiuMju (MkuMiv + MkuMiu) 
u<u 


+(n-1) ` MivMju (MkuMiy + masma) 


u<u 


p 
(n— 1)? > mam) 2 men) 
u=l 
fori #k,j Æl, i,j k,l =1,...,p, and 
yY — 
Var (A;) = = A n-1) (x: mame) + 2(n —1) >> mim), 


+(n- 1) bD (MiuMju + ma) 


u<u 


2 
-(n- 1) HS mum) 


for i,j =1,...,p 

Let A = TTT and A = SMS? be the triangular and spectral de- 
compositions of A. Let W = UUT be the triangular decomposition of 
W ~VW,(R,n). Let ,...,l) and m,...,mp, denote the latent roots 
of W and A, respectively. Also define 


1 
n+p+1—2i 


we 
and 
v-2 1 

v n+p+1-2i 
with D = diag(d),...,d,) and D* = diag(dj,...,d5). Then, some fur- 
ther expectation identities involving A useful in the estimation of R 
are 


Eflog(|A[)] = flog (|W])] + 2p (log?) , 


E [iog ((R>A|)] = $E [log (xi41-i)] + 2pE (logr), 


t=1 
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and 
E [tr (R'TATT)] = —3E [tr (RUAU?)] 


(Joarder and Ali, 1997), where A is a positive definite diagonal matrix 
and 7 has the inverted gamma distribution given by (9.3). 

Joarder and Ahmed (1998) considered a generalization of the multi- 
variate t model in (9.2) when the random sample X),..., Xn is assumed 
to come from a p-variate elliptical distribution with the joint pdf 


co eR 2 
f (xi) Ei f (27)?/2 
where h(-) is the pdf of a nondiscrete random variable 7. Many multi- 
variate distributions having a constant pdf on the hyperellipse (x — ys)? 
R: (x — u) = c? may be generated by varying h(-). In this general 
case, the model corresponding to (9.2) has the joint pdf 


exp {5 (ti = 1)" (rR) (xi = w)} Wr, 


f (x1,---)%n) 
= [eR lg Teganya 
= L —= _ ;— h(r)dr. 
[Gaya PEL OH Hw" (PR) (= u) Mere 
(9.7) 
The observations X,,...,X, are independent only if 7 is degenerate 


at the point unity, in which case the joint pdf (9.7) denotes the pdf of 
the product of n independent p-variate normal distributions each being 
N,(u,R). Furthermore, if v/T? has the chi-squared distribution with 
degrees of freedom v, then (9.7) reduces to (9.2). The pdf of the Wishart 
matrix A under the generalized model (9.7) takes the form 


—n/2 (n—p-1)/2 
|R|” A] 1 ere 
T, (n/2) exp -5tr ((r R) A) h(r)dr, 


where A > 0, n > 1+>p, andTI,(-) is as defined in (9.6). Some expecta- 
tions of A useful for estimating R are 


T (n/2+r) 


f(A) = 


EAD = 2 gy P Te 
Tp (n/2+k) oi 
k = kp P TEA 
E |A| Al = (n+ 2k) IRI! Ri 


and 


E [tra)?] = ny [n (trR)? + 2tr (R?)| ; 
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where y E€ R, k € R, n+ 2k > 0, Yopr = ET?) > 0, Y2kpp2 = 
E(r7*P+2) > 0, and y4 = E(r*) (all assumed to exist). 

The pdfs of A in the real and complex cases — under the indepen- 
dence model (9.1) — were originally studied by Cornish (1955) and Gupta 
(1964), respectively. Nagarsenker (1975) provided a very detailed study 
of the distribution of A and its quadratic forms. He investigated both 
the noncentral real and the noncentral complex cases. 

Let Y be a p x n matrix of iid normal random variables with means 
E(Yij;) = pij and covariance matrix ¢?7R. Assume that S is an indepen- 


dent random variable having the ,/o!*x3,,/(2v) distribution. Then the 


noncentral version of A is defined by 
Š 7 
52 72 (Y iz — Yi) (Yin — ¥5) ; 
where 
1 n 
=> > Yi. 
j=l 


In the real case, S?A has the noncentral Wishart distribution. In the 
complex case, it should be interpreted as a Hermitian positive defi- 
nite matrix having the noncentral complex Wishart distribution (James, 
1964). In the noncentral real case, Nagarsenker (1975) established that 
the pdf of A is given by the complicated expression involving zonal 
polynomials 


f(A) 
(orn) |R |-(n-/2} A |(n—p)/2 exp {-trR- ppt / (20?)} 
aP=1) (2v)P™—DPL, (n — 1)/2)F(v) 
: ` v (o') T (v +k + p(n — 1)/2) Cr (Ruu TRA) 
ko k k(n — 1)/2)x (4v0*)* {1+ (o”trR-!A) /2v0?} 


—1 
POD 4 6, (9.8) 
where K = {ky,..., km}, ki > ka > e > km > 0, ki +khot---+km = k, 
_ Pm(2,4) 
(z) = Tala)” 


T, (2,4) = TOME (e+ h)T (z+) T (24h - 5S), 
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and C’,(T) are symmetric homogeneous polynomials of degree k in the 
latent roots of T. In the particular case 0 = o’ and u = 0, (9.8) 
reduces to the expression given in Cornish (1955). If B is an (n — 1) x 
(n— 1) symmetric positive definite matrix of full rank, then Nagarsenker 
(1975) established further that the quadratic form Q = ABAT has the 
formidable pdf 


f(Q) 
(on PD | R ae | Q |(n—P)/2 
a1) 2P- DAT, (n ~1)/2) Fw) | BP? 


gy ee ee) ( Jp (PPD ev). 


tao kaloy" Cy (In-1) 
(9.9) 


In the particular case ø = o' and B = I,,-1, (9.9) reduces to equation 
(14) in Cornish (1955). Nagarsenker also provided the joint cdf and the 
moment generating function of Q as well as the corresponding expres- 
sions for the noncentral complex case (which generalizes those given in 
Gupta, 1964). 


9.2 Multivariate t Statistic 


A random variable X with iid copies X1, X2,... is said to be in the 
domain of attraction of the normal law if there exists an —> oo such that 


as n — oo. It is well known that, for X in the domain of attraction of 
the normal law, the ¢ statistic defined by 


T, = Re (He) > N(0,1) (9.10) 
ei (Xi = X) 


as n — oo, where, as usual, X = (1/n) ©; X 

Sepanski (1994, 1996) provided two multivariate analogs f (9.10). Let 
X be a p-variate random vector with mean vector yz and covariance 
matrix X. Also let X,,X»o,... be iid copies of X. Then X is said to 
be in the domain of attraction of a p-variate normal law if there exists 
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Gy — œ such that 


1 n 
— > (Ki-vw) > N(0,C) (9.11) 
an i=l 
for some nonsingular matrix C. Sepanski (1994) defined the multivariate 


t statistic by 


T, = DGPS (X%i-p), (9.12) 


i=1 


where, for some sequence bn > 0, Dy = Cy, + bnl and 


Note that Cn is symmetric nonnegative definite while D, is symmetric 
positive definite. Under the assumption that X satisfies (9.11), Sepa- 
nski (1994) showed that Tn — N(0,I) as n — oo. Sepanski (1996) 
established the same limiting result under weaker conditions by taking 


Ta = CP Y (Xi- u) (9.13) 
i=1 


and considering its behavior when X is in the generalized domain of 
attraction of a normal law, which means that there exist matrices An 
and vectors y2,, such that 


An > Xi- > N(0,1) (9.14) 
t=1 
as n — oo. See Hahn and Klass (1980a, 1980b) for several examples 


of random vectors satisfying this condition and for an algorithm for 
constructing the normalizing matrices Ay. 


9.3 Hotelling’s T? Statistic 
A customary approach to the estimation/testing problem is based on 
the so-called Hotelling’s T? statistic. It is defined by 
H2 = n?(K-p)’ C7) (K-n). 


Under the normality, it is well known that (n — p)H2/(p(n — 1)) is 
distributed as an F distribution with degrees of freedom p and n—p (see, 
for example, Anderson, 1984). The distribution of H? has been studied 
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under a mixture of two normal distributions by Srivastava and Awan 
(1982) and Kabe and Gupta (1990). Iwashita (1997) investigated the 
asymptotics of H? under an elliptical distribution. Unfortunately, there 
are no direct results for the specific case of multivariate ¢ distributions. 
For completeness, however, we shall survey the results when X has an 
elliptical distribution. In this case, the characteristic function of X can 
be written as 


b(t) = exp {it?m} Y (m7Q™'m) (9.15) 
for some nonnegative function ¥ (Kelker, 1970) and the parameter 
YO 
{Wop ~ 


which controls the kurtosis of the distribution. Iwashita (1997) provided 
an asymptotic distribution of H2 under the null hypothesis that m = p 
and a local alternative of it. Up to the order of 1/n, the asymptotic null 
pdf of H2? is given by 


j=0 
where 

1 

co = -7P tp + K(p + 2)}, 
1 

a = ~sp{1~x(p+2)}, 
1 

o = JPO+I(L-»), 


and gą(-) denotes the pdf of a chi-squared distribution with degrees of 
freedom k. Iwashita (1997) also derived the percentiles and approximate 
powers of the H2 statistic. An asymptotic expansion of the cdf of H2 
under the two assumptions 


(i) E(|| Y |f) < œ, where Y = ©71/?(K — u) and X is a px 1 
random vector with mean vector 4 and covariance matrix ©; 

(ii) the distribution of Y = (¥j,..., Y,) has an absolutely continuous 
component with a positive density on some nonempty open set 
U such that 1,41,-.-,%p,¥7,¥1¥2,---, yp are linearly independent 
on U 
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is given by Fujikoshi (1997). It takes the form 


3 
1 1 
Pr(H Sa) = Gye) + D AGr) +0 (=) ew 
uniformly for all positive real numbers x, where G,(-) denotes the cdf 
of a chi-squared distribution with degrees of freedom k. The coefficients 
B;’s are given by 


= ghey 7 ant 10 
Bo = 4? +z (x ) 44 3 
— hee DO ob a 
A = -zp- 5 («$ ) t3“ , 
1 1 1 
fo = 5p(p+2)—5 (ns?) - Gat, 
and 
wo ON Dey" 
Bs = x (st?) +5 (a0?) | 
where 


a 
wom 
fay 
w 
Il 
— 
a 
~ 
2 
5 
= 
<~ 
— 


KP = 5y klibik) klikk) 5 
i,j,k 


1 bogus 
Kí BS SO RUD, 
i,j 


and «>--+4) are the jth cumulants of Y. If X has the elliptical distri- 
bution given by (9.15), then 83 vanishes to zero and x reduces to cp for 
k = 0,1,2. Kano (1995) obtained the same asymptotic expansion as in 
(9.16), using a different method. 

It is well known that, for large samples, H2 has a limiting chi-squared 
distribution with degrees of freedom p. The usual underlying assumption 
for this result is simply that E || X ||< oo. More general limiting 
behavior of H? has been studied by Eaton and Efron (1970), Sepanski 
(1994), and Fujikoshi (1997). Sepanski (1994) showed that, under the 
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assumption of the generalized domain of attraction (defined in (9.14)), 
the modified Hotelling’s T? statistic 


H? = n®(K—p)"D;" (R-n) 


still has an asymptotic chi-squared distribution with degrees of freedom 
p. Eaton and Efron (1970) studied the distribution H? when X has 
orthant symmetry, that is, X has the same distribution as DX for any 
choice of the diagonal matrix D with diagonal elements equal to 1. 
We shall now consider the Hotelling’s T? statistic in the context of 
testing equality of means. Suppose X; = (Xi1,---,Xin,)" and X; = 


(X21,--.,X2n,)? are two samples of size nı and nz, respectively. In 
analogy with (9.2), assume that X; and X2 have the joint pdf given by 
f (X1,X2) 


A -(v+np)/2 
x |R”? v-2+9 Y (xij -= pj)" Ro (xij — pi) , 


i=1 j=l 
where n = nı + nz. It is immediate that Xj; is p-variate t with mean 
vector #2;, correlation matrix R, and degrees of freedom v. Also, the 
elements of the combined sample of size n = nı + nz are pairwise un- 
correlated. The Hotelling’s T? for testing equality of means takes the 
form 


2 nin2 = T -1 <7 
T ni +My (X1 - X2) Spooled (Xi - X2), 
where 
2 ni 
1 Ns Bn = \T 
X; — X;) (X; — X; 
Spooled im bmg 23 De j )( j ) 


Sutradhar (1990) derived the nonnull distribution of the T? statistic, 
given by the pdf 


P(E) = Žale- (0415-2) 


-p-1 
x By (« +5, mime) | (9.17) 


where 
T (k) (m)z"-! 


Pm) = tma 
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and 
Ny N92 


Tp-l 
z = R — m). 
SE (Hı — Ha) (H — He) 


Note that, under Ho : p; = Ho, where 6 = 0, the pdf of T? in (9.17) 


reduces to 

se = g (pm), 
which implies that, under Ho, T? (nı + n2 — p — 1)/p has the usual F 
distribution with degrees of freedom p and nı +n —p— 1. Thus the null 
distribution remains the same as in the normal case. Furthermore, the 
power of the Hotelling’s T? test can be computed by using the nonnull 
pdf in (9.17). 

Kozumi (1994) considered testing equality of means when the two 
samples X; and X% have mutually independent t distributions with equal 
correlation matrices and equal degrees of freedom. When the sample 
sizes are equal (say, nı = nz = n) the T? statistic is given by 


T? = ny" S7'y, 


where y and Są are, respectively, the sample mean and the sample co- 
variance matrix of the differences y; = 21; — £2j. For unequal sample 
sizes, assuming without loss of generality that nı < nz, the T? statistic 
is given by 


where 


ni n2 
Ny + 1 1 
Zj = X15 — 4/ —X2j —— >} x2- — > X2 
j j j 8 8 
ne ynn = ne P 


and Z and S, are the sample mean and the sample covariance matrix 
of the z;’s. It should be noted that T? reduces to T? in the case nı = 
no =n. Under the Ho : p) = ps, (nı — p)T2/(p(nı — 1)) has the usual 
F distribution with degrees of freedom p and nı — p. The nonnull pdf 
of T? is given by the infinite sum involving Student’s t pdfs 


n o mm = (n26)* Tlk +v) 
fE) = (nı — 1)T? C/A 2+ HB O/2 FF, (nı — p)/2) 


2 p/2+k— al: #2 =(nı /2+k) 
x s 1 3 
(— = 7 ( E Nı — z) 


xJ (z;n1, no, k,ô, v), 
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where ô = (H) — ftp)? R! (p; — Ho) and the integral 


1 
J (a371,72,k,6,v) = J {z1 - z) t? {rade — 2) 
0 


—(k+v) 
w(r-B) art dz. 
ny ny 


Kozumi (1994) also provided an expression for the cdf of T? and calcu- 
lated the powers of T? corresponding to the sizes a = 0.01,0.05, p = 5 
and for various values of nı, n2, ô, and v. 


9.4 Entropy and Kullback-Leibler Number 


The forms of entropy and Kullback-Leibler number for the multivariate 
t distribution were discussed earlier in Chapter 1 (see equations (1.27), 
(1.29), and (1.31)). Here, we shall discuss the corresponding sampling 
properties. 

The entropy for the central p-variate t involves the correlation matrix 
R, and it is known that the maximum likelihood estimator of R for a 
sample of n observations is based on the Wishart matrix A in (9.4). 
Hence it is of interest. to consider the sampling properties of the differ- 
ence ô = H(X; A) — H(X; R). Guerrero-Cusumano (1996a) derived the 
corresponding moment generating function, mean, variance, and some 
asymptotics. Specifically, 


mo = wen (p48) (38) [oH] G) 


and 


e z y= B- pios {rip - v) 
+ N(0,p) 


as n ~— oo, where w(-) denotes the digamma function. Note that 
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H(X; R) = H(X; A) — E(6) is an unbiased estimator for H(X; R) 
with E(H,,) = 0 and Var(H,,) = Var(5). Also note that, as v > 00, 


i n-i 
E(6) > aA) + plog 2, 


coinciding with the result given in Ahmed and Gokhale (1989) for the 
multivariate normal distribution. The expression for Var(ô) given above 
is also valid for the multivariate normal distribution since it is indepen- 
dent of v. 

The Kullback-Leibler number for the central p-variate ¢ is given by 
(1.31). The corresponding maximum likelihood estimator for a sample 


of n observations is 
A p y—2 
- =!) s 
a 2 og ( v ) 


Thus, the sampling quantity of interest is the difference 6 = T(X;R) — 
T(X; R). Guerrero-Cusumano (1996b, 1998) derived the corresponding 
moment generating function, cumulant generating function, cumulants, 
mean, variance, and some asymptotics. Specifically, 


mo = (Fa) oa") 
BERO 
Ko = Foe (55) Hoer (=) -1r (3) 


+ {leer ==) — logIT (A) te 


i=l 


T(X;R) = o- Flog 


Var(5) = 3 {po (=) y0 (=) l l 
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and 
y+p 


2 

6 > X(v-+p)? 

as n — oo and v > oo. Furthermore, 
(n-1)T(X;R) > Xpp-1)/2 


and 


tr (B?) — 
vn—-15 > N (==) 
as n — oo, where B = Aaa? with A4 denoting the diagonal 
matrix of A. In the latter limit, it is assumed that v is known. When 
v is unknown, the limit still holds for n sufficiently large. The exact 
distribution of ô is quite complicated to obtain in a closed form. 


10 


Estimation 


The material in this chapter is of special interest to researchers attempt- 
ing to model various phenomena based on multivariate t distributions. 
We shall start with a popular result in the bivariate case. 


10.1 Tiku and Kambo’s Estimation Procedure 


In Chapter 4, we studied a bivariate t distribution due to Tiku and 
Kambo (1992) given by the joint pdf 


f(x x ) ied 1 14+ (£2 9)? + 
re: = 0102 k(1- p?) kož 


x exp a {a ~ m — pS (22 - m} ; 
207 (1 — p?) o2 
(10.1) 
Here, we discuss estimation of the parameters ji, H2, 01, 02, and p when 
v is known. The method for estimating the location and scale parameters 
developed by Tiku and Suresh (1992) is used for this problem. For a 
random sample {(X1i,X2i),4 = 1,...,n} from (10.1), the likelihood 
function is 


alm X21) — on 
L x {0203 (1—-p”)} PTT {1+ mh 


i=1 


1 n po: 2 
M ie oe 
x exp | 202 al = p?) — { [1:2] Hı o2 ( (2: ) m) } 


where k = 2v — 3, Xizi), i = 1,..., n are the order statistics of X2; and 
Xini i = 1,...,n are the corresponding concomitant X, observations. 
Consider the following three situations: 
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(i) Complete samples are available and v is not too small (v > 3). 

(ii) Complete samples are available but v is small (v < 3). 

(iii) A few smallest or a few largest X2; observations and the corre- 
sponding concomitant. Xj: are censored due to the constraints 
of an experiment. This situation arises in numerous practical sit- 
uations. In a time mortality experiment, for example, n mice are 
inoculated with a uniform culture of human tuberculosis. What 
is recorded is X2;: the time to death of the first A(< n) mice, 
and X,;: the corresponding weights at the time of death. 


These situations also arise in the context, of ranking and selection (David, 
1982). We provide some details of the inference for situation (i) as 
described in Tiku and Kambo (1992). Using a linear approximation of 
the likelihood based on the expected values of order statistics, it is shown 
that the maximum likelihood estimators are ' 


~ uaa PO) / 
Į = £1 — = (Z2 — p2), 
02 
s T 
a ES 2 12 2 
ju = sit 2 (3- | 
2 \ 82 
fe = 2- = (ūīı —p), 
2 72 
s G 
Be 2 12 1 
m2 ē = 2 + 2 2 1}, 
Si Si 
and 
a _ $1202 
= 2 
$5 01 


where (Z1, 2) are the usual sample means, (s?, s2) are the usual sample 
variances, and $2 is the sample covariance. The estimators f1, fi2, G1, 
G2, and f are asymptotically unbiased and minimum variance bound 
estimators. The estimator G? is always real and positive while the es- 
timator p always assumes values between —1 and 1. The asymptotic 
variances and covariances of the estimators can be written as 


_ fv 0 
v= (ow) 
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where 
y 1 o? Poo 2mv —nk / poa?o? poyo? 
ı = >- - > 
n \ pozoi o2 2vmno? poło: Os 


(10.2) 


is positive definite and is the asymptotic variance-covariance matrix of 
(i, fiz) while 


o? po\o2 po; (1 = P) 


Vo = ES p01 02 o3 po2 (1 a P) 
pai (1- ø) po2(1—p?) 2(1-#) 
5 prot p°o\02 Po (1- ø) 
mO p°0102 o3 por (1 — p°) 
n < 
Pa(l-P) po2(1—p?) P-e? 


is positive definite and is the asymptotic variance-covariance matrix of 
(G1, G2, p). The parameters m and 6 are determined by the linear 
approximation of the likelihood. Interestingly, Var(fi1) and Var(fiz) 
decrease with increasing p? unless v = œo. The first component on the 
right of (10.2) is the variance-covariance matrix of f1, and fiz under bi- 
variate normality, and the first component on the right of (10.3) is the 
asymptotic variance-covariance matrix of ĉ1, G2, and p under bivariate 
normality. The second components in (10.2) and (10.3) represent the 
effect of nonnormality due to the family (10.1). The asymptotic distri- 
bution of Vn (fı — u1, #2 — u2) is bivariate normal with zero means and 
variance-covariance matrix nV,. For testing Ho : (41, #2) = (0, 0) versus 
Hy : (11, #2) # (0,0), a useful statistic is T2 = (f1, fo)’ Vz" (fh, f2), 
the asymptotic null distribution of which is chi-squared with degrees 
of freedom 2. The asymptotic nonnull distribution is noncentral chi- 
squared with degrees of freedom 2 and noncentrality parameter 


2mv 2 
T (BE) (BY. 
kn 02 
where 


= = ral) OC 


Note that Às is the noncentrality parameter of the asymptotic nonnull 
distribution of the Hotelling’s T? statistic based on the sample means 
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(£1, £2), sample variances (s?, s3), and the sample correlation coefficient 
P = $12/(s1S2). Tiku and Kambo (1992) also provided evidence to the 
fact that the use of T? in place of the Hotelling’s T? statistic can result 
in a substantial gain in power. 


10.2 ML Estimation via EM Algorithm 


Consider fitting a p-variate ¢ distribution to data x,,...,X, with the 
log-likelihood function 


n 
buno = b igRie 22S togwv+s:), (10.4) 
2 2 i=l 


where si = (x — #)7R-!(x — u) and v is assumed to be fixed. Differen- 
tiating (10.4) with respect to u and R leads to the estimating equations 


u = ave {w;x;} /ave {wi} (10.5) 
and 
R = ave fwi (x — p) (x - u} ; (10.6) 


where w; = (v + p)/(v + s:) and “ave” stands for the arithmetic av- 
erage over i = 1,2,...,n. Note that equations (10.5)-(10.6) can be 
viewed as an adaptively weighted sample mean and sample covariance 
matrix where the weights depend on the Mahalanobis distance between 
x; and u. The weight function w(s) = (v + p)/(v + s), where s = 
(x — w)7R-!(x — u), is a decreasing function of s, so that the out- 
lying observations are downweighted. Maronna (1976) proved, under 
general assumptions, the existence, uniqueness, consistency, and asymp- 
totic normality of the solutions of (10.5)-(10.6). For instance, if there 
exists a > 0 such that, for every hyperplane H, Pr(H) < p/(v +p) —a, 
then (10.5)-(10.6) has a unique solution. Also, every solution satisfies 
the consistency property that limpsoo(fi, R) = (u,R) with probability 
1. 

The standard approach for solving (10.5)-(10.6) for u and R is the 
popular EM algorithm because of its simplicity and stable convergence 
(Dempster et al., 1977; Wu, 1983). The EM algorithm takes the form of 
iterative updates of (10.5)-(10.6), using the current estimates of and 
R to generate the weights. The iterations take the form 


pm) = ave fux} Jave {wi} 
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and 
R+) = ave jum (xi e pin) (x: a po)" } 


where 


w™ = (v +n / fv + (xi - yon) (Rim) (xi - wm} 3 


This is known as the direct EM algorithm and is valid for any v > 0. 
For details of this algorithm see the pioneering papers of Dempster et 
al. (1977, 1980), Rubin (1983), and Little and Rubin (1987). Several 
variants of the above have been proposed in the literature, as summa- 
rized in the table below. 


Algorithm Primary References 


Extended EM Kent et al. (1994), Arsian et al. (1995) 
Restricted EM Arsian et al. (1995) 
MC-ECM1 Liu and Rubin (1995) 
MC-ECM2 Liu and Rubin (1995), Meng and van Dyk (1997) 
ECME1 Liu and Rubin (1995), Liu (1997) 
ECME2 Liu and Rubin (1995) 
ECME3 Liu and Rubin (1995) 
ECME4 Liu and Rubin (1995) 
ECME5 Liu (1997) 
PXEM Liu et al. (1998) 


, Meng and van Dyk (1997) 
, Liu (1997) 


Consider the maximum likelihood (ML) estimation for a g-component 
mixture of ¢ distributions given by 


g 
f (x; P) = Y rif (x; Hi Rivi), 
i=1 
where 
T ((vi + p) /2) 
; p Ri, i m ena ae LD 
FesmoRew) = PPT (4/2) R 


a —(vitp)/2 
as (x — w;)" R; "(x — 1) 


Vi 
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= (Ti, .,Tg-1,0",VT)T, 0 = (Gh Ray jh (Hg, Ray) and v = 
(vi... Vg)”. The application of the EM algorithm for this model in a 
clustering context has been considered by McLachlan and Peel (1998) 
and Peel and McLachlan (2000). The iteration updates now take the 
form 


pint) = > (m) ul ac, Sar 


and 
T n 
REH m (m m+1 m+1 m 
Ym Seeded al) (y= ale)” Soa 
j=l 
where 
ul) = vy” +p 
ij n T =4, 
A) 4 (x; - a!) R™ (x; -a™) 
and 


(m) — 


nl”) f (5500, RO, vf”) 
P= amy 


The EMMIX program of McLachlan et al. (1999) for the fitting of nor- 
mal mixture models has an option that implements the above procedure 
for the fitting of mixtures of t-components. The program automatically 
generates a selection of starting values for the fitting if they are not 
provided by the user. The user only has to provide the data set, the 
restrictions on the component-covariance matrices (equal, unequal, di- 
agonal), the extent of the selection of the initial groupings to be used to 
determine the starting values, and the number of components that are 
to be fitted. The program is available from the software archive StatLib 
or from Professor Peel’s homepage at the Web site address 


http://www.maths.uq.edu.au/~gjm/ 


10.3 Missing Data Imputation 


When a data set contains missing values, multiple imputation for missing 
data (Rubin, 1987) appears to be an ideal technique. Most importantly, 
it allows for valid statistical inferences. In contrast, any single impu- 
tation method, such as filling in the missing values with either their 
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marginal means or their predicted values from linear regression, typi- 
cally leads to biased estimates of parameters and thereby often to an 
invalid inference (Rubin, 1987, pages 11-15). 


The multivariate normal distribution has been a popular statistical 
model in practice for rectangular continuous data sets. To impute the 
missing values in an incomplete normal data set, Rubin and Schafer 
(1990) (see also Schafer, 1997, and Liu, 1993) proposed an efficient 
method, called monotone data augmentation (MDA), and implemented 
it using the factorized likelihood approach. A more efficient technique to 
implement the MDA than the factorized likelihood approach is provided 
by Liu (1993) using Bartlett’s decomposition, which is the extension of 
the Bayesian version of Bartlett’s decomposition of the Wishart distribu- 
tion with complete rectangular normal data to the case with monotone 
ignorable missing data. 


When a rectangular continuous data set appears to have longer tails 
than the normal distribution, or it contains some values that are influen- 
tial for statistical inferences with the normal distribution, the multivari- 
ate t distribution becomes useful for multiple imputation as an alterna- 
tive to the multivariate normal distribution. First, when the data have 
longer tails than the normal distribution, the multiply imputed data 
sets using the ¢ distribution allow more valid statistical inferences than 
those using the normal distribution with some “influential” observations 
deleted. Second, it is well known that the ¢ distribution is widely used 
in applied statistics for robust statistical inferences. Therefore, when an 
incomplete data set contains some influential values or outliers, the ¢ dis- 
tribution allows for a robust multiple imputation method. Furthermore, 
the multiple imputation appears to be more useful than the asymptotic 
method of inference since the likelihood functions of the parameters of 
the ¢ distribution given the observed data can have multiple modes. For 
a complete description of the MDA using the multivariate t distribution, 
see Liu (1995). See also Liu (1996) for extensions in two aspects, includ- 
ing covariates in the multivariate ¢ models (as in Liu and Rubin, 1995), 
and replacing the multivariate t distribution with a more general class 
of distributions, that is, the class of normal/independent distributions 
(as in Lange and Sinsheimer, 1993). These extensions provide a flexi- 
ble class of models for robust multivariate linear regression and multiple 
imputation. Liu (1996) described methods to implement the MDA for 
these models with fully observed predictor variables and possible missing 
values from outcome variables. 
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10.4 Laplacian T-Approximation 


The Laplacian T-approximation (Sun et al., 1996) is a useful tool for 
Bayesian inferences for variance component models. Let p(@ | y) be the 
posterior pdf of 0 = (6),...,0,)7 given data y, and let 7 = g(0) be 
the parameter of interest. Leonard et al. (1994) introduced a Laplacian 
T-approximation for the marginal posterior of 7 of the form 


p* (nly) «x T, p(Only) APF (njw, 03, T) (10.7) 


to be the marginal posterior pdf of 7, where 
Qn, 


w 
T ——— 
a" (w+ p)Ay 
T aiy 
5 ont, 
7 w+p-1’ 
21,17 
7 w+p-1’ 


dlogp (8 |y) 
066-0, 


3” logp (8 | y) 

T $ 

a (007) 0-0, 
ee s 0n +Q7'1,; 


n 


and f(n | w,@),T,) denotes the pdf of n = g(@) when @ possesses 
a multivariate t distribution with mean vector 0%, covariance matrix 
T,,, and degrees of freedom w. Here, 0, represents some convenient 
approximation to the conditional posterior mean vector of 0, given 7, and 
w should be taken to roughly approximate the degrees of freedom of a 
generalized multivariate T-approximation to the conditional distribution 
of 0 given 7. 

When @,, is the conditional posterior mode vector of 0, given 7, (10.7) 
reduces to the Laplacian approximation introduced by Leonard (1982) 
and shown by Tierney and Kadane (1986) and Leonard et al. (1989) 
to possess saddlepoint accuracy as well as an excellent finite-sample ac- 
curacy, in many special cases. It was previously used for hierarchical 
models by Kass and Steffey (1989). 
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In the special case where 7 = a’ is a linear combination of the 6’s, 
the approximation (10.7) is equivalent to 


-1/ lw š -1 
p (nly) o (En p (Only) Az Ot, (w,aT0;, (aTa) ~>), 


where t„(w, u, T) denotes a generalized t pdf. 


10.5 Sutradhar’s Score Test 
Consider a random sample X,,...,X, from a p-variate t distribution 
with the pdf 
(v — 2) PT ((v + p)/2) 


f (x;) = nT (v/2) R 


—(v+p)/2 
x [v - 2+ (xj — Ww) R (x; - »)| : 
Note this is a slight reparameterization of the usual t pdf. The log- 
likelihood 


G = X log f (x;) 


j=l 


is a function of the parameters R., p, and v. 

Frequently in social sciences, and particularly in factor analysis, one 
of the main inference problems is to test the null hypothesis R = Ro 
when p and v are known. Sutradhar (1993) proposed Neymann’s (1959) 
score test for this test for large n. Le r = (ri1,.--,Tht,---,Tpp)? be the 
p(p+1)/2 x1 vector formed by stacking the distinct elements of R, with 
rp. being the (h,l)th element of the p x p matrix R. Also let 


T An 
(A15. -3 Ais- -s Ap(p+1)/2) = b (ro, A, D) 
and 
r _ [ETAD 
E Bence 


where b(ro, R, 7), €(ro, A, 0), and (ro, R,P) are the score functions ob- 
tained under the null hypothesis r = ro, by replacing p and v with their 
consistent estimates f and V in 


OG 


= (10.8) 


b (ro; B, D) 
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rae ðG 
ro, R, P) = —, 10.9 
and 
AA ðG 
nto ĝ,®) = =, (10.10) 
respectively. Furthermore, let T;(ro, A, D) = À; — Pii Bij Yi, where Bij 


is the partial regression coefficient of À; on y;. Then, Neyman’s partial 
score test statistic is given by 


W (Hi, 0) 
wee ee -1 ee -1 
~ ~ en M M M 
= T” |My - (Mi2Ms) ( 4 T ) ( RT ) T, 
33 13 


(10.11) 


where T = [T; (ro, A, D), - - . , To(p+1)/2 (£0, A, D))T for i,r = 1,2,3; Mir 
are obtained from M;, = E(—Dj,) by replacing p and v with their 
consistent estimates; and D;, for i,r = 1,2,3 are the derivatives given 


by 


8G 
Du = grar” 

8G 
Di2 E ðrðw’? 

8G 
Dis = Bray’ 

8G 
Da anon” 

0G 
D man 
23 Opov’ 

and 

OG 

D33 = pe 


Under the null hypothesis r = ro, the test statistic W(#,7) has an 
approximate chi-squared distribution with degrees of freedom p(p + 
1)/2. The test based on (10.11) is asymptotically locally most powerful. 
Clearly the implementation of this test requires consistent estimates of 
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ji, D as well as expressions for the score functions and the information 
matrix. The maximum likelihood estimates of yz and v are obtained by 
simultaneously solving 


oan, [Soa 
j=1 j=l 
and 


n (ñ, ro, P) = 0, 


where q; = 0 — 2 + (X; — fi)’ Ro(X; — f) and Ro is specified by the 
null hypothesis. The omen estimates of yz and v (which also turn out 
to be consistent) are 


j=1 
and 
_ 2{2Ba ~ f (ro)} 
7 Bo — f (ro) 
where 
a 1 
Ê = = HES TR (X -3| 


j=l 


is a consistent estimator of the multivariate measure of skewness (see, 
for example, Mardia, 1970b), and 


P 


BD 08) {ADP + D> O N 


h=1 hh! 


where 7°), and rbh denote the (h, h')th element of Ro and Rj", respec- 
tively. 


10.5.1 Score Functions 
The score functions defined in (10.8), (10.9), and (10.10) are given by 


R 1 aS i 
blr, ñ, D) = -3 ryt oom Sata h 


j=l 
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E (r, f, D) 


n 
w+pR Y g" (X; - n), 
j=l 
and 


iBS legee -3+ (F) -r G) 


-> Jer» yoo +E , 


ari m 


respectively, where 7)(-) denotes the digamma function and q; is a non- 
homogeneous quadratic form given by q; = v — 2 + trR-!B; with B; = 
(X; - u)(X; - p)”. 


10.5.2 Information Matriz 
By taking the second derivatives and then applying expectations, one 


can derive the elements of the information matrix. The first element 
takes the complicated form 


Mi; = [m*(1,1),m*(1,2),...,m*(h,l),...,m*(p,p)], 


where, for l > h, h,l = 1,...,p, m*(h,1) is the p(p + 1)/2-dimensional 
vector, formed by stacking the distinct elements of the p x p symmetric 
matrix 


+ 
Mh. = > [h ® (r')"] = ny +P) p) R Qa, R}. 


Here, r* denotes the kth column of the R`! matrix, and the (u, v)th 
element of the p x p matrix Qn, is given by 


(v + 2)? S YE rhit (r Tuv + Tiufkv +TivfTku) 
2z en SE tS eee ; alk wT ke 
(v+4)2(v+ p)\(v +p 2) ES ikuv iuf kv ivfku) 
where r™° and rms denote the (m, s)th element of R! and R, respec- 
tively. The second element of the information matrix Mj» is zero. The 
third element My3 is formed by stacking the distinct elements of the 
symmetric matrix 


n(p + 2) -1 
(v—2)(v+p)(v+p+t 2) 
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The remaining elements of the information matrix are given by 
nu(v +p) 


My» = —~——_R"}, 
5 (v=2)(v+p+2) 


and 


- 1 ,fvt+p 1 ,/v v—4 
M= n [iw ( 2 j- 7 (G) -ai 
1 nv (v? + vp — 6p — 2v — 8) 
2 (v—2)?(v + p)\(v +p+2)- 


10.6 Multivariate t Model 


Consider the following multivariate t model described in equation (9.2) 
of the preceding chapter 


f (x1,.--,Xn) 
T ((v + p)/2) 
(nny)??? T (v/2) [RI 
12 —(v+np)/2 
x [1 += Y (xi — p) RO (xi — p) . (10.12) 


i=l 


In this section, we consider estimation issues associated with the corre- 
lation matrix R and its trace tr(R). 


10.6.1 Estimation of R 


Joarder and Ali (1997) developed estimators of R (when the mean vector 
H is unknown) under the entropy loss function 


L(u(A),R) = tr(R7'u(A)) —log [Ru (A)| — p, 


where u(A) is any estimator of R based on the Wishart matrix A defined 
in (9.4). Based on the form of the likelihood function, the entropy loss 
function has been suggested in the literature by James and Stein (1961) 
and is sometimes known as the Stein loss function. Some important 
features of the entropy loss function are that it is zero if the estimator 
u(A) equals the parameter R, positive when u(A) 4 R, and invariant 
under translation as well as under a natural group of transformations of 
covariance matrices. Moreover, the loss function approaches infinity as 
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the estimator approaches a singular matrix or when one or more elements 
(or one or more latent roots) of the estimator approaches infinity. This 
means that gross underestimation is penalized just as heavily as gross 
overestimation. 

In estimating R by u(A), Joarder and Ali (1997) considered the risk 
function R(u(A),R) = E[L(u(A),R)]. An estimator ue(A) of R will 
be said to dominate another estimator u;(A) of R if, for all R belonging 
to the class of positive definite matrices, the inequality R(u2(A),R) < 
R(u1(A), R) holds and the inequality R(u2(A), R) < R(ui(A), R) holds 
for at least one R. 

Joarder and Ali (1997) obtained three estimators for R, by minimizing 
the risk function of the entropy loss function among three classes of 
estimators. 


e First, it is shown that the unbiased estimator R = (v — 2)A/(vn) has 


the smallest risk among the class of estimators of the form cA, where 
c > 0, and the corresponding minimum risk is given by 


R(R, R) = plogn =a Ele (x2.44-:)] + plog (=) 
~—2pE (logr), 


where 7 has the inverted gamma distribution given by (9.3). 


Second, the estimator R* = TD*T’, where T is a lower triangular 
matrix such that A = TTT and D* = diag(d},...,d}) with dj defined 
by 
-2 1 
d; = ema 
v nt+tp+1—2 


has the smallest risk among the class of estimators TATT, where A 
belongs to the class of all positive definite diagonal matrices and the 
corresponding minimum risk function of R* is given by 


p p 
R(R*,R) = Ñ log(in+1+p—~2i)— >> E [log (xĝ41-:)] 
i=1 i=l 

+p log (=) — 2pE (log 7), 


where 7 is as defined above. Furthermore, R* dominates the unbiased 
estimator R = (v — 2)A / (vn). 
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e Finally, consider the estimator R= S¢(M)S, where A has the spec- 
tral decomposition A = SMS’, with ¢(M) = D*M. Then the esti- 
mator R = SD*MS? dominates the estimator R* = TD*T’. 


10.6.2 Estimation of tr{(R) 


Let 6 = tr(R) denote the trace of R. Joarder (1995) considered the 
estimation of ô for the multivariate t model under a squared error loss 
function following Dey (1988). The usual estimator of ô is given by 
ð= cotr(A), where co is a known positive constant and A is the Wishart 
matrix defined in (9.4). The estimator 6 defines an unbiased estimator 
of ô for co = (v — 2)/(vn) and a maximum likelihood estimator of 6 for 
co = 1/(n + 1) (see, for example, Fang and Anderson, 1990, page 208). 
Joarder and Singh (1997) proposed an improved estimator of ô — based 
on a power transformation — given by 


5 = eptr(A)+coc {piaj -tr a} ; 


where co is a known positive constant and c is a constant chosen so that 
the mean square error (MSE) of ô is minimized. Calculations show that 


MSE (ô) = MSE (F) + cfi + èp, 
where 
A = 28E |(cotr (A) - ô) (pial -tr(A))] (10.13) 
and 


b = &B|play/? -tr(A)]. (10.14) 


Thus MS E(6) is minimized at c = —81/(282) and the minimum value is 
given by MSE(6 6) - B? /(4B2). This proves that 6 is always better than 
the usual estimator in the sense of having a smaller MSE. The estimate of 
cis given by ê= —/y / (28>), where ĝ, and ĝ are obtained by calculating 
the expectations in (10.13) and (10.14) using the numerous properties 
given in Section 9.1 and then replacing R by the usual estimator co A. 
It can be noted from Fang and Anderson (1990, page 208) that the 
estimators KA and Bo are the maximum likelihood estimators of 8, and 
b2, respectively, provided R. = coA and co = 1/(n + 1). 

The following table taken from Joarder and Singh (1997) presents the 
percent relative efficiency of 6 over 6. 
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v Re=diag(1,1,1) R = diag(4,2,1) R =diag(25,1,1) 


5 105.32 130.31 153.90 
10 102.13 117.56 148.76 
15 101.53 112.07 127.15 


The numbers are from a Monte Carlo study carried out by generating 
100 Wishart matrices from the multivariate t-model with n = 25 and 
p=3. 


10.7 Generalized Multivariate t Model 


Consider the generalized multivariate t model (9.7) discussed in the pre- 
ceding chapter. The usual estimator of R is a multiple of the Wishart. 
matrix of the form R = co A, where cg > 0. Joarder and Ahmed (1998) 
proposed improved estimates for R as well as its trace and inverse under 
the quadratic loss function. The proposed estimators for R are 


R = œA -caj I, (10.15) 


where I is an identity matrix and c is chosen such that R is positive 
definite. For an estimator R* of R, let L(R*, R) = tr[(R* — R)?] denote 
the quadratic loss function and let R(R*,R) = EL(R*,R) denote the 
corresponding risk function. The relationship between R and R is rather 
involved. Defining the dominance of one estimator over another in the 
same manner as in Section 10.6.1, Joarder and Ahmed (1998) established 
that R dominates R for any c Satisfying d < c < 0, where 


= np +2 Dp ((n — 1)/2 + 1/p) 
nS (a ~ o ou 


with co < py/((n — 1)p + 2) or 0 < c < d, where d is given by (10.16) 
with co > py/(np+ 2) and y by y = %2/%4 and yi = E(t"), i = 1,2,3,4. 
The risk functions of the two estimators are given by 


à I, (n/2 + 2/p) Jp dtr (R/p) 
a (Rem) = ame RE (e ie) 
+ {1 + (n — 1)coy (con — 27)} tr (R?) 
+(n — leey, (trR)? 


and 


R (R, R) = {1+(n—1)cov (con — 27)} tr (R?) 
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+(n — 1) (trR)’ , 


respectively. Now consider estimating the trace 6 = trR. The usual and 
the proposed estimators are § = cotrA and 6 = cotrA — cp | A |}/P, 
respectively, where co > 0 and c is such that the proposed estimator is 
positive. Joarder and Ahmed (1998) established that the corresponding 
risk functions are given by 


R (5, ô) = [(n—1)eo {(n — 1oy — 272} + 118? 
+2(n — 1)cãłyatr (R?) 


and 


OEE o mh (-— 


respectively. It is evident that § dominates 6. Finally, consider estimat- 
ing the inverse P = R! with the usual and the proposed estimators 
given by © = coA7! and È = AW! — co | A |71? I, respectively, 
where co > 0 and c is such that the proposed estimator is positive defi- 
nite. In this case, it turns out that Ẹ dominates © for any c satisfying 
d<c<0, where 


7 Co _ Y-2\ Tp ((n = 1)/2 - 1/p) 
eS eas oy ACER EEETA oa, 


with co < (n — 2/p — p — 2)7_2/y-4 or 0 < c < d, where d is given by 
(10.17) with co > (n — 2/p — p — 2)y-2/7-4 and q; = E(t). 


10.8 Simulation 


Simulation is a key element in modern statistical theory and applica- 
tions. In this section, we describe three known approaches for simulat- 
ing from multivariate ¢ distributions. Undoubtedly, many other methods 
will be proposed and elaborated in the near future. 


10.8.1 Vale and Maureli’s Method 


Fleishman (1978) noted that the real-world distributions of (univariate) 
variables are typically characterized by their first four moments (that 
is, mean, variance, skewness, and kurtosis). He presented a procedure 
for generating nonnormal random numbers with these four moments 
specified. He accomplished this by taking a nonnormal variable X as a 
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linear combination of the first three powers of a standard normal random 
variable Z 


X = a4bZ4+cZ?+dZ'. (10.18) 


To determine the constants, Fleishman expanded (10.18) to express the 
first four moments of X in terms of the first 14 moments of Z. After 
considerable algebraic manipulation, Fleishman was able to represent 
the solution to the constants of (10.18) as a system of nonlinear equa- 
tions. For a standard distribution (that is, with mean zero and variance 
one), the constants b, c, and d are found by simultaneously solving the 
following equations 


b + 6bd+2c? + 15d” = 1, (10.19) 


2c (b? + 24bd+ 105d? +2) = mn, (10.20) 
and 


24 {bd +c? (1+ 6” + 28bd) + d? (12 + 48bd + 141c? + 225d”)} = %2, 
(10.21) 


where yı is the desired skewness and 72 is the desired kurtosis. The 
constant a in (10.18) is determined by 


a= =c. (10.22) 


Univariate nonnormal random numbers are then generated by drawing 
normal random numbers and transforming them using the constants a, 
b, c, and d in (10.18). 

Vale and Maureli (1983) extended Fleishman’s procedure for multi- 
variate nonnormal distributions with specified intercorrelations as well 
as specified moments. The procedure begins by specifying the constants 
necessary for Fleishman’s procedure. For each variable independently, 
these are given by the solution of (10.19)-(10.22). Define two variables 
Zı and Z as from standard normal populations, and define the vec- 
tor z as z7 = [1, Z, Z?, Z3]. The weight vector w? contains the power 
function weights a,b,c, and d: wT = [a,b,c,d]. The nonnormal variable 
X then becomes X = wTz. If rx,x, denotes the correlation between 
two nonnormal variables X and X> corresponding to the normal vari- 
ables Zı and Zo, it is then easily seen that rx,x, = w? Rw2, where 


Xı = wiz, X2 = wiz and R is the expected matrix product of zı 
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and 23 given by 


1 0 1 0 
R = 0 TZ, Zo 0 3r Z Z2 
1 0 2r% z +1 0 
0 3rzz 0 6r3 z, + az 


Collecting the terms and using (10.22), a third-degree polynomial in 
TZ,Zz2, the correlation between the normal variables Z, and Z2, results 


TX X2 = ZZ, (bib2 + 3b, dy + 3d,b2 + 9d) d2) + 2c1eary, z, 
+6d, dork, z,- 


Solving this polynomial for rz,z, provides the correlation required to 
obtain the desired post-transformation correlation rx, x. These corre- 
lations can then be assembled into a matrix of intercorrelations, and this 
matrix can be decomposed to yield multivariate normal random numbers 
for input into Fleishman’s transformation procedure. 


10.8.2 Vaduva’s Method 


Vaduva (1985) provided a general algorithm for generating from mul- 
tivariate distributions and illustrated its applicability for multivariate 
normal, Dirichlet, and multivariate ¢ distributions. Here, we present a 
specialized version of the algorithm for generating the p-variate ¢ distri- 
bution with the joint pdf 
T 2 1 —(v+p)/2 
f(x) = a US TE. [i+ ETR] 
(xv)P/?T (v/2) |R] . 

over some domain D in RP. It is as follows 

(i) Initialize. 

(ii) Determine an interval I = [u§, vg] x --- x [up, vp], where 


Uo = 0, 
py =~, ds 
1 
v? = Aia ) i=l, P, 
v-—1 
and 
1 
v} — Or ) =1,. »P 
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(iii) Generate the random vector V* uniformly distributed over I. If 
RND is a uniform random number generator, then V* may be 
generated as follows 


(a) Generate Uo,Ui,...,Up uniformly distributed over [0, 1] 
and stochastically independent. 
(b) Calculate V = v? + (v} — v?)Ui, i = 0,1,...,p. 
(c) Take V* = (VŠ, Vi*,...,V)). 
(iv) If V* ¢ D, then go to step (iii). 
(v) Otherwise, take V = V*. 
(vi) Calculate Y; = V;/Vo, i =1,...,p. 
(vii) Take X = (Yi,...,¥p)7. Stop. 


Note that the steps from (iii) to (v) constitute a rejection algorithm. 
The performance of this algorithm is characterized by the probability to 
accept V*. This probability can be calculated in the form 


z nP/2T (v/2) (: = 
Po D+ DE (V+ p)/2 PF ” 


which yields 


lim pa = 0 
vo 

and 
lim pa = 0, 
p00 


indicating inadequate behavior of the algorithm for large values of p 
and/or v. 


10.8.3 Simulation Using BUGS 


A relatively simple way to generate a multivariate ¢ involves a sampling 
of z from gamma(v/2, v/2) and then sampling a multivariate normal 
y ~ N,(u,R/z). This mode of generation reflects the scale mixture 
form of the multivariate ¢ pdf. In BUGS the multivariate normal is pa- 
rameterized by the precision matrix P; thus one programs a multivariate 
t pdf as follows to generate a sample of n cases (for Sigmal,], nu.2 and 
mu[] known) 


for (i in i:n) 
{z[i] ~ dgamma(nu.2, nu. 2) 
yli, 1:q] ~ dmnorm(mu, P.sc[,])} 
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for (i in 1:q) {for (j in 1:q) 
{P[i, j] <- inverse(Sigmal[,], i, j) 
P.scli, j] <- z[i] * PLi, j]}} 


If one has observed multivariate data and wishes to assume multivari- 
ate t sampling, then in BUGS the dmt() form is available 


for (i in i:n) {y[i, 1:q] ~ dmt(mu[], P[,], nu)} 


where nu is assumed known. 


11 


Regression Models 


There is a large number of contributions (scattered in the literature and 
many of them motivated by economic applications) dealing with regres- 
sion models with the error term distributed according to the multivariate 
t distribution. In this chapter, we shall discuss several of them. 


11.1 Classical Linear Model 


Let the model for n observations Y = (y1,...,Yn)7 be 
Y = XBte, (11.1) 


where X is an n x p design matrix with rank p, 8 is a p x 1 vector of 
regression parameters with unknown values, and € is an n x 1 random 
error vector. For the usual t regression model it is assumed that the n 
elements of e have the multivariate t pdf 


_ vPD(nt+v)/2) [hep Or? 
fle) = nr/2gnT (v/2) Ç =| ` 


In practice, there are several situations in which the model (11.1) is 
useful. Under (11.1), the least squares estimate of Ø is 


ĝ = (XTX) `’ XTy. (11.2) 


Zellner (1976) noted that this is also the maximum likelihood estimate 
of B. From Singh (1991), 3 is a minimum variance linear unbiased esti- 
mator and also a minimum variance unbiased estimator. The variance- 
covariance matrix for ĝ is 

vo? 


Var (8) =E (4 - 6) (2-6) =- (Te 3) 


228 
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Note that as v —> oo, the above variance approaches (X7X)~1!0?, which 
is the variance-covariance matrix in the normal case. Thus, for small 
and moderate values of v, the variances of the elements of B are inflated 
considerably, as compared to those for large values of v. 
Singh (1988) provided the following estimate of the degrees of freedom 
parameter 
2 (2a — 3) 


— = —— 
a-3 ° 


where 
(1/n) Eia (ye - x78) 
fam Et (w- x78) } 


a= 
and B is the least squares estimator given by (11.2). 
The maximum likelihood estimate of g? is 


oS > (y = xB)" (y ~ XÊ) 
as in the normal case. For v > 2, 


2 
= n — p)o. 
E (6?) = ( p) u 
n 
where o2 = vo?/(v — 2) is the common variance of the elements of e. 
Thus, €’ €/(n — p) is an unbiased estimator for 02 while 
si v-~2 wr. 
= ———2¢ (11.4) 
v(n — p) 
is an unbiased estimator for o?. In the class of estimators qê” €, with 
q being a positive scalar, the minimal mean squared error estimator for 
o? is (with v > 4) 


= —— E E, (11.5) 


while the minimal mean squared error estimator for o2 in this class is 
(v — 4)é7é/{(v — 2)(n — p+2)}. The variances of the unbiased and the 
minimal mean squared error estimators of a? are 

204 n-p+v—2 
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and 
(v —4)\(n-p+v-2) , 


Var (3?) z= A rE ; 


(11.7) 
respectively. Since G? is an unbiased estimator for g?, the variance (11.3) 
can be unbiasedly estimated by 
~ (5 -1 €e 
Var (ĝ) = (XTX) En 
n-p 
Similarly, (11.4) and (11.5) can be estimated by 


Fo 2. Soe E 11.8 
(a——p) (11:8) 
and 
= 3 
z ATs 
o (@a—3)(n=pt2) Gee he €, (11.9) 


respectively. The estimates for the variances given by (11.6)-(11.7) are 


pene 2 *2)2 a ape 
Var (67) = (o ) are de 
n-p yp—4 
and 
TE) = an- pE A-r- (02) 
Var (67) = 2(n-p) (D ~ 2)?(n — p+ 2)? G ) , 


respectively, where o*” may be taken as 2 given in (11.8) or as a given 
in (11.9). 

It is important to note that, even though the elements of € have the 
nonnormal pdf and are not independent, tests and intervals based on 
the usual ¢ and F statistics still remain valid. For example, t = (B; — 
Bi) / {smë}, where m* is the (i,i)th element of (K7X)~! and s? = 
ee/ (n-p), has the usual Student’s t distribution, and thus probability 
statements based on this statistic will be appropriate. Also, s?/o? has 
the usual F distribution with degrees of freedom v and n — k. This fact 
can be used to construct confidence intervals for and test hypotheses 
about 0”. 

Singh et al. (1995) proposed the generalized estimator Ê, = g(t)B 
for B, where t = ETE, A xTxB has at least the first k > 6 moments 
finite and g(t), satisfying the validity conditions of Taylor’s series ex- 
pansion and having the first three derivatives with respect to t bounded, 
is a bounded function of t such that g(0) = 1 and g(t) = O(1) as 
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6 = BTXTXB —> oo. It should be noted that the maximum likeli- 
hood estimator @ and the estimators considered by Singh (1991) are 
all particular cases of 6,. Singh et al. (1995) investigated the bias 


p(B.) = E(B, - B)7Q8, — ß)] of the generalized estimator when 
Q is a positive definite matrix. It was established that 


x 7 (n — p)vo?g' (0) vo? (p — 2) 
E (ĝ,) = ß+ TO k- bw — 4) 


ee aN Oleo (=) 


28(v — 4)g' (0) 03 
and 
A vo? -1 (n — pyy’o%g'(0) 
P(B) = Fy {(X7x)" @} + Tae 


x [a {(x7x) Q} + naps e] 


Since p(B, ) < p(B), one observes that B, is dominant over the maxi- 
mum likelihood estimator B. Also in the class B,, there exists better 
estimators than those considered in Singh (1991). 

Sutradhar (1988b) considered testing Ho : CG = 0 versus H; : CB # 
0, using the classical F statistic 


_ W-W 
W = m’ 
where 
—_ EZ Tx)71 yT 
wm = ~—>(In—X(X7X)"X7)¥ 
is the residual sum of squares of the full model (11.1) and 
B TZ\ IZT 
m = (In -2 (z272) z7) ¥ 


is the residual sum of squares of the reduced model 
E(Y) = Za, (11.10) 


which is obtained from (11.1) by using the restriction under Ho. In Ho, 
C is an r x p matrix of known coefficients with rank(C) = q, and, in 
the reduced model (11.10), Z denotes the new design matrix of order 
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n x (p — q) and a is a vector of (p — q) parameters. Sutradhar (1988b) 
established that the pdf of W is given by 


Do v q n-p 
f(w) =z yaa 2 Bora (k +1,5 -1) Bo GE: 2 i 
where 
_ T(a+b)227} 
felad = Taro +a 
and 6 is the noncentrality parameter given by 
_ Yr-2a7lyry _ yr T7\-1 oT 
ô = <P" (XTX - x72 (Z7Z) Z"X) B. 


Sutradhar also computed the corresponding power of the test, yielding 
the expression 


2 a ù a 1 


where uo = 1/[1 + (q/(n — p))F;,n-p,a] and Tu (a,b) denotes the incom- 
plete beta function ratio. As v — oo, this expression reduces to the 
power of the F test under normality (Tiku, 1967). 

The distribution of future responses given a set of data from an infor- 
mative experiment is known as a predictive distribution. Haq and Khan 
(1990) derived the predictive distribution for (11.1). Rewrite (11.1) in 
the equivalent form y = BX + ce and let Yp be a future response cor- 
responding to the design vector xy, that is, ys = Oxy + oes. Haq and 
Khan (1990) showed that the predictive pdf of Yp is given by 


f (ys ly) 


x [2 +s~*(y) {yz — Bly) xs} (1— xf A7*xy) {ys — blys} 


’ 


joe 


where b(e) = eX7(XX7)-!, s?(e) = (e — @)(e — @)T, € = b(e)X, and 
A=XXT +x fx}. Thus, for the given informative data y, the predic- 
tive distribution of Y; is t with mean vector b(y)x,, variance-covariance 
matrix (n — p)s?(y)/{(n — p — 2)(1— x} A~'x,)}, and degrees of free- 
dom n — p. A prediction interval of the desired coverage probability can 
easily be obtained by using the standard t-table. Note that the predic- 
tive distribution does not depend on the degrees of freedom parameter 
of the original ¢ distribution. For a set of n’ future responses given by 
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Y; = @Xy+0e;, Hag and Khan (1990) noted similarly that the predic- 
tive distribution of y+ is n'-variate t with mean vector b(y)X¥, variance- 
covariance matrix | In, — X;Q-!X, [1/2 s(y) (where Q = XXT + 
X pXF) and degrees of freedom n — p. It is to be noted that the distri- 
bution of (n—p)s~?(y)(Y¢ —b(y) Xs) (In ~X Q Xz) (Vz — W(y) Xz)” 
is F with degrees of freedom n’, and n — p. This distribution can be uti- 
lized for determining the prediction region for a set of future responses 
with any desired coverage probability. 


11.2 Bayesian Linear Models 


In his classical paper, Zellner (1976) provided a Bayesian analysis of the 
linear model (11.1). Consider the diffusion prior for 8 and o”, that is, 


p(B,07) « > (11.11) 


where 0 < g? < oo and ĝ; € R, i =1,...,k. Then, assuming that v is 
known, the posterior pdf of the parameters is 


[vo?/A (B) 
A (B) [1 + vo? /A (BEP) 


where A(8) = (y — X8)T (y — XB). It follows that the conditional 
posterior pdf of 8 given g? and v is in the form of a multivariate t pdf 
with mean ĝ (the least squares estimate in (11.2)). The corresponding 
conditional posterior covariance matrix is given by 


p(8,07|y,v) x TONE 


Var (Bly,o7,v) = va tn ps iz) : 


provided that n ~p+v > 2, where (n — p)s? = (y ~ X8)" (y — XA). As 
v — oo, the conditional posterior pdf for 8 and g? approaches a mul- 
tivariate normal pdf with mean ĝ and covariance matrix (X7X)~!9?, 
which is the usual result for the normal regression model with the diffuse 
prior pdf (11.11). The marginal posterior pdf for £ is 


p(Bly,v) œ fin — p)s? + (8 ~B)" XTX (8 - a)" 
(11.12) 


which is in the form of a p-dimensional t pdf and does not depend on 
the value of v. In fact, (11.12) is precisely the result that one obtains 
in the Bayesian analysis of the normal regression model with the diffuse 
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prior for the parameters shown in (11.11). The marginal posterior pdf 
for g? is 


PETE P oma (1+, vee a 


s? n — p)s? 


from which it follows that o?/s? has the F pdf with degrees of freedom 
v and n — p, a result paralleling the classical results mentioned in the 
preceding section. From properties of the F distribution, the modal 
value of o?/s? is ((n — p)/v)((v — 2)/(n — p + 2)), when v > 2 and its 
mean is (n — p)/(n — p — 2) when n — p > 2. Also, as v > ov, the 
posterior distribution of vs?/o? approaches a chi-squared distribution 
with degrees of freedom n — p, a distributional result that holds for the 
Bayesian analysis of the usual normal regression model with diffuse prior 
assumptions. Finally, note that the posterior pdf for na? /(y—XB)? (y— 
XP) is Fyn. 

The natural conjugate prior distribution for ø? and £ is the product 
of the marginal F pdf for gø? times a conditional p-dimensional t pdf for 
B given o”, that is, 


p(B,o7|-) = pr (o?|-) ps (Blo”,-), (11.13) 


where 

2 2\(v—2)/2 
Elen oc We Meta) 
(1+ vo? /vas?) 4 


(where va > 0, Sa > 0, and 0 < ø < œ) and 


, 


zia 7 ag ff 1 a —(2v+p)/2 
ps (B10?,B,A,Pa) œ az” fva + z (B~B)" A (B-B) 


where £; € R, i = 1,...,p, A is symmetric and positive definite, B is 
the prior mean vector, Da = V + Va, and G? = (vas? + vo?)/Da. As 
with the natural conjugate for the usual normal regression model, it is 
seen that 8 and g? are not independent in the natural conjugate prior 
distribution in (11.13). If the natural conjugate prior distribution is 
thought to represent the available prior information adequately, it can 
be used for obtaining the posterior distribution; see the appendix in 
Zellner (1976) for details. 
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11.3 Indexed Linear Models 


Lange et al. (1989) and Fernandez and Steel (1999) provided a far- 
reaching extension of (11.1) to handle the situation when y;’s are as- 
sumed to have the ¢ distribution with degrees of freedom v; and param- 
eters p = g;(@) and R = h;(@) indexed by some unknown parameters 0 
and @. Lange et al. (1989) suggested an EM algorithm for estimation. 
They also considered methods for computing standard errors, developed 
graphical diagnostic checks, and provided applications to a variety of 
problems. The problems include linear and nonlinear regression, robust 
estimation of the mean and covariance matrix with missing data, un- 
balanced multivariate repeated-measures data, multivariate modeling of 
pedigree data, and multivariate nonlinear regression. They also derived 
the expected information matrix for (8,@,v) for one observation in the 


form 
(er) a v+p dvT -1 OV 
00;00; ~ y +p+2 06; 00; i 
ðlog L v+p ( _, OR 1) 
E = eee ZES So 
EA utor E a Bd; 
ee 9B). (ns) 
a (® AAG ð$) 
ðlogL\ _ 1 _, OR 
P (Spar) E rrera" (F ae 
and 
Olog L = 1 1 “ y+p 1 n" v Pp 
r (5) 7 “ie ( =+) 2 CTS 


where ~ (x) = d? logI'(x)/d2z is the trigamma function. The remaining 
elements of the matrix are zero. 

In an important paper, Fernandez and Steel (1999) revealed some 
pitfalls with a model of the above kind. Under a commonly used non- 
informative prior, they showed that Bayesian inference is precluded for 
certain samples, even though there exists a well-defined conditional dis- 
tribution of the parameters given the observables. They also noted that 
global maximization of the likelihood function is a vacuous exercise since 
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the latter becomes unbounded as one tends to the boundary of the 
parameter space. More specifically, let 1[(@,¢,R,v) be the likelihood 
function for n independent observations y; assumed to have the t dis- 
tribution with mean vector g;(@), common covariance matrix 0?R, and 
common degrees of freedom v. For given values of 0 = 6), R. = Ro, and 
v = w, let 0 < s(@9) < n denote the number of observations for which 
yi = 9:(80). Then the following hold 


(a) If 
Th E 
then 
lim 1 (80, 0, Ro, vo) = 300; 
(b) If 
TES ps (80) 
n — s (0o) 
then 
lim 1 (80,0, Ro, vo) € (0,00). 
(c) If 
ae = a 
then 


lim 1 (80, o, Ro, vo) = 0. 
a0 


It is evident from this result that one can determine a value 8o such that 
yi = gi(@o0) holds for at least one observation and the likelihood function 
does not possess a global maximum. Indeed, for sufficiently small values 
of v, one can make 1(@9,¢, Ro, vo) arbitrarily large by letting ø tend to 
zero. These pitfalls arise as a consequence of the (sometimes neglected) 
fact that the recorded data have zero probability under the assumed 
model. Fernandez and Steel (1999) proposed and illustrated a Bayesian 
analysis on the basis of set of observations that takes into account the 
precision with which the data were originally recorded. 
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11.4 General Linear Model 
Rubin (1983) and Sutradhar and Ali (1986) considered the general linear 
model set up in the form 


Y = PX+e, (11.14) 


where X is a k x n design matrix with rank k, B is a p x k matrix of 
regression parameters with unknown values, and € is a p x n random 
error matrix. It is assumed that the error variables cij satisfy 


E (eij) = 0, Wi,j=1,...,p, 
E (é,) = o? Ag, Vi,j =1,...,p, 
E (eijer) = o? Au, Vi, j l =1,...,p, 


and 
E (ejay) = 0, Wil gj EJ, 


where Aj; are unknown parameters. Furthermore, it is assumed that, 
for a given g, the errors €1,...,€, are independently and normally dis- 
tributed, with the distribution of €; = (€1j,...,€pj)7 being N(0,07A) 
for j = 1,...,n while ø is assumed to be a random variable having an 
inverted gamma distribution with the pdf given by 


2(v/2)" 9? —(v+1) v 
Ju] (v/2)” exp {=z 


where v is an unknown parameter. Under these assumptions, one can 
show that the joint distribution of error variables is 


(v — 2)*P ((v + np)/2) 
TPT (y/2) R? 


n 
x i -2+ Ý ER ej 


j=1 


f(a) 


| —(vtnp)/2 
where R = vA/(v — 2). It then follows that E(e;) = 0, E(eje7) =R 
and E(eje7) = 0 for j #5, js =1,...,n. 

Sutradhar and Ali (1986) provided a least squares estimator for B as 
well as moment estimators for R and v. The least squares estimator is 


Ô = (XXT)xy7 


238 Regression Models 


while the moment estimators are given by 


R= EE (v5-Bx) (9~Bs)" 
j=l 


Pp p 

2 1 
Je. a a2 ~A a2 a4 
p = (5a - DEA) /(spa-irya), 

i=1 i j i=1 i j 
where €;; are the so-called estimated residuals expressed as the difference 
k 
&j = Yj- YO Ginte;- 
r=1 


All three estimators B, R, and P are consistent as n — oo. 

Let Y = (yi,---,¥n)?, where yj = (y1;,---,Ypj)7- Let Y* denote 
the stacked random vector corresponding to Y, so that Y* = (yi, ..., 
Ypi; Y12 +++) Yp2s +++) Yiny +++) Ypn)”- Let B* and e* be the corresponding 
stacked random vectors. Then the model (11.14) can be written in terms 
of Kroneckor products as 


Y* = (1,@X7)p* +e’. (11.15) 
Suppose one wishes to test the hypothesis that Hp : 0* = 65 versus 
H; : 0” # 65. In the case where v and R. are known, Sutradhar and Ali 
(1986) showed that a suitable test statistic is 

v fx T -1)71 /2* 
= -ø xT - 65). 
D — (ô 63) {Re (X ) \ (ô o; 

Lower values of this statistic D will favor Ho while higher values, will 


direct the rejection of Hp. Actually, it can be shown that the pdf of D 
is 


yr/qke/?-) < T ((v + kp)/2 + 23) 


HO = “Poppy Tapti TG +H) 
x (Ad)? (A + v + a) FRPP 25, 
where 
A = <5 (6° - 65)" B~ (6° — 65). 


Note that, under Ho : 6" = 05, D/(kp) has the usual F distribution with 
degrees of freedom kp and v, whereas the analogous test for the classical 


11.5 Nonlinear Models 239 


MANOVA model has the chi-squared distribution with degrees of free- 
dom kp. Also note that the power of the test changes under Ħ,, whereas 
the similar statistic has the noncentral chi-squared distribution for the 
usual normal model. In the case where v and R. are not. known, since 7 


and R are consistent estimators, an F test based on D = DUTU/(P-2), 
~ ~—1/2 a 
vetea O — 6%), may still be approximately valid. 


Little (1988) extended the general linear model (11.14) to handle in- 
complete data. The methods for estimation employed are maximum like- 
lihood (ML) for multivariate t and contaminated normal models. ML 
estimation was achieved by means of the EM algorithm and involves 
minor modifications to the EM algorithm for multivariate normal data. 


11.5 Nonlinear Models 


Nonlinear models involving multivariate t distributed errors have been 
studied relatively recently. Chib et al. (1991) considered nonlinear re- 
gression models with errors that follow the multivariate ¢ distribution 
with degrees of freedom v. For an n x 1 vector of observations y, the 
model is specified by 


y = h(X,8)+e, (11.16) 


where X is an n x r matrix of regressors, 8 is the regression coefficient 
vector, A(X, 8) is a vector function of (X, 8), and € is the error vector. It 
is assumed that € | X,@,7,7,v has an n-variate t distribution with zero 
mean vector, covariance matrix (1/7)V(X,7), and degrees of freedom 
v. On can see that (11.16) reduces to (11.1) simply by setting r = p, 
h(X, 8) = XB, and V(X,7) = In. The sampling density resulting from 
(11.16) is at pdf, which can be represented as the following scale mixture 
of normal pdfs 


f(y[Xw) = | f(y [X,2,w) f (2|K,w) de, 


where f(y | X, z, w) is an n-variate normal pdf with mean vector h(X, 8) 
and covariance matrix 1/(z7)V(X,7) and f(z | X,w) is a gamma pdf 
with parameters (v/2,v/2). Note that the proper pdf, f(z | X,w), is 
independent of X and does not involve parameters other than v. 

In the classical linear model due to Zellner (1976), the marginal pos- 
terior of the regression parameter, G, is unaffected by the multivariate 
t assumption (see Section 11.2). This result was extended by Chib et 
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al. (1998), Osiewalski (1991), and Osiewalski and Steel (1990) for el- 
liptically distributed errors. For the nonlinear model above, Chib et 
al. (1991) provided the following sufficient conditions under which the 
posterior of v, p(v | y, X), coincides with the prior, p(v) 


e For proper priors p(w), if v is independent of (G8,7,7z), then v is 
independent of (y, X). 

e For improper priors of the form p(w) = p(r)p(8,7)p(v), where p(T) œ 
1/7, T > O and p(v) is proper and functionally independent of (7, 8,7), 
if the posterior of v exists, then p(y | y, X) = p(v). 


12 
Applications 


Due to limitations on the size of this book and since the aim is to collect 
and organize results on multivariate t distributions, in this short chapter 
we collect and present a small number of relatively recent applications 
of multivariate ¢ distributions. The treatment is by no means exhaus- 
tive. Some other applications — in particular those related to Bayesian 
inference — are mentioned in the previous chapters (see Chapters 1, 3, 
5, 10, and 11). 


12.1 Projection Pursuit 


Exploratory projection pursuit is a technique for finding “interesting” 
low p-dimensional projections of high P-dimensional multivariate data; 
see Jones and Sibson (1987) for an introduction. Typically, projection 
pursuit uses a projection index, a functional computed on a projected 
density (or data set), to measure the “interestingness” of the current 
projection and then uses a numerical optimizer to move the projection 
direction to a more interesting position. Loosely speaking, a robust pro- 
jection index is one that prefers projections involving true clusters over 
those consisting of a cluster and an outlier. A good robust projection 
index should perform well even when specific assumptions required for 
“normal operation” fail to hold or hold only approximately. In a paper 
that was awarded the Royal Statistical Society Bronze Medal, Nason 
(2001) developed five new indices based on measuring divergence from 
the multivariate ¢ distribution with the joint pdf 


a T ((v + p)/2) xP x \ C+) 
f(x) = zrl — PT (v/2) (1+ =) 


241 


242 Applications 


that are intended to be especially robust. The first three indices are 
all weighted versions of the L?-divergences from f for v > 3. They are 
given by 


[L2 - J (lx) - fœ} f*(x)dx 


for a = 0,1/2,1. Nason (2000) derived an explicit formula for the case 
a = 0. The fourth index is the Student’s ¢ index defined by 


I1 = - f peax. 


This index is minimized over all spherical densities by f(x). Specifically, 
it satisfies the inequality 


pil se a Ue pie) 
Yv = 7/2 (y — 2AT (v/2) 


for all spherical densities g with equality if and only if g = f almost ev- 
erywhere. The proof of this result uses the fact that the index can be rep- 
resented as the sum of two F-divergences (Vajda, 1989). Through both 
numerical calculation and explicit analytical formulas, Nason (2001) 
found the the Student’s ¢ indices are generally more robust and that 
indices based on L?-divergences are also the most robust in their class. 
A detailed analytical exploration of one of the indices (r122) showed 
that it acts robustly when outliers diverge from a main cluster but be- 
haves like a standard projection index when two clusters diverge, that 
is, its behavior automatically changes depending on the degree of outlier 
contamination. The degree of sensitivity to outliers can be reduced by 
increasing the degrees of freedom v of the J, TL2 index to make it behave 
increasingly like Hall’s index (Hall, 1989) as v > oo. 

Using the transformation z = tan(@), Nason further developed the 
orthogonal expansion index given by 


2 [7 2 : 
m = i p (200) = Z costo) dð, 


where ge is the pdf of the transformed projected data X. Using the 
Fourier series expansion of ge(@) on [-7r/2, 7/2], 


golð) = = + 5 {an cos(2n0) + an sin(2n8)}, 


n=l 
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where 


2 a /2 
an = z g(0) cos(2n0)dé 
/2 


y. 


and 


2 a/2 
b = = i g(9) sin(2n8)dd, 


T J—n/2 


the index nie can be expanded as 


2 2 2 
L2 amyl 3 1 1 
a = {3 (0-2) +(a-3) t(e-3 


12.2 Portfolio Optimization 


There are a number of places in finance where robust estimation has 
been used. For example, when a stock’s returns are regressed on the 
market returns, the slope coefficient, called beta, is a measure of the 
relative riskiness of the stock in comparison to the market. Quite often, 
this regression will be performed using robust procedures. However, 
there appear to be fewer applications of robust estimation in the area 
of portfolio optimization. In the problem of finding a risk-minimizing 
portfolio subject to linear constraints, the classical approach assumes 
normality without exceptions. Lauprete et al. (2002) addressed the 
problem when the return data are generated by a multivariate distri- 
bution that is elliptically symmetric but not necessarily normal. They 
showed that when the returns have marginal heavy tails and multivariate 
tail-dependence, portfolios will also have heavy tails, and the classical 
procedures will be susceptible to outliers. They showed theoretically, 
and on simulated data, that robust alternatives have lower risks. In par- 
ticular, they showed that when returns have a multivariate ¢ distribution 
with degrees of freedom less than 6, the least absolute deviation (LAD) 
estimator has an asymptotically lower risk than the one based on the 
classical approach. The proposed methodology is applicable when heavy 
tails and tail-dependence in financial markets are documented especially 
at high sampling frequencies. 
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12.3 Discriminant and Cluster Analysis 


In the past, there have been many attempts to modify existing methods 
of discriminant and cluster analyses to provide robust procedures. Some 
of these have been of a rather ad hoc nature. Recently the multivari- 
ate t distribution has been employed for robust estimation. Suppose, 
for simplicity, that one utilizes two samples in order to assign a new 
observation into one of two groups, and consider the joint distribution 


F(x], x3) 
vv — 20 (v + np/2) 
qrnp/2 R”? 
Bo es —(v+np)/2 
x 1(¥-2) + OS xis -u R? (iy m;)| (12.1) 


i=1 j=1 


of the two samples X7 = (Xi,..-,Xin,) and X3 = (X21, ..-, Xen.) of 
sizes nı and ne, respectively. In (12.1), n = nı + ne. The (nı + n2)p- 
dimensional ¢ distribution (12.1) was proposed by Sutradhar (1990). It 
is evident that the marginals are distributed according to 


a Vy — 20 (v + p/2) 
f (Xij) xP! IRJ”? 
x [le = 2) + (xy — wi) R Gy a) O 2a 


which is a slight reparameterization of the usual multivariate ¢ pdf. Let 
mı and m2 denote the two t-populations of the form (12.2) with param- 
eters (4,,R,v) and (u3, R, v), respectively. Fisher’s optimal discrimi- 
nation criterion is robust against departure from normality (Sutradhar, 
1990), and it assigns the new observation with measurement X to 7 if 


; al 1 Z 
d(x) = (m = fg) ROM — 5 (p — og)” RO (p, + Ma) > O; 


otherwise, it assigns the observation to 72. But even though the clas- 
sification is based on the robust criterion, the probability of misclassi- 
fication depends on the degrees of freedom of the ¢ distribution. If eı 
and ez are probabilities of misclassification of an individual observation 
from mı into 72 and from 7 into 71, respectively, then 


N eee D m-a 
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for i = 1,2, where A? = (m — py)’ RT! (p — H2). Calculations of e1 
and ez for selected values of A and v (Sutradhar, 1990) suggest that if a 
sample actually comes from a ¢-population (12.2) with degrees of freedom 
v, then the evaluation of the classification error rates by normal-based 
probabilities would unnecessarily make an experimenter more suspicious. 
Sutradhar (1990) illustrated the use of the preceding discrimination ap- 
proach by fitting the ¢ distribution to some bivariate data on two species 
of flea beetles. 

McLachlan and Peel (1998), McLachlan et al. (1999), and Peel and 
McLachlan (2000) used a mixture model of t distributions for a robust 
method of mixture estimation of clustering. They illustrated its useful- 
ness by a cluster analysis of a simulated data set with added background 
noise and of an actual data set. For other recent methods for making 
cluster algorithms robust, see Smith et al. (1993), Davé and Krishna- 
puram (1995), Jolion et al. (1995), Frigui and Krishnapuram (1996), 
Kharin (1996), Rousseeuw et al. (1996), and Zhuang et al. (1996). 


12.4 Multiple Decision Problems 


The multivariate ¢ distribution arises quite naturally in multiple decision 
problems. In fact, it is one of the earliest applications of this distribu- 
tion in statistical inference. Suppose there are q dependent. variates with 


means 6), ..., Ôn, ..., Oq, respectively, and that one has estimators Ê, of 
Ôn, h = 1,...,q available, which are jointly distributed according to a 
q-variate normal distribution with mean p, h = 1,...,q, and covariance 


matrix o?R, where R is a q x q positive definite matrix and g? is an 
unknown scale parameter. Let s? be an unbiased estimator of a? such 
that s? is independent of the 6;,’s and vs?/o? has the chi-squared distri- 
bution with degrees of freedom v. Consider p < q linearly independent 
linear combinations of 8ps, 


q 
X l PER ie 
Mi = CinOr =C; 0, 
h=1 


for i = 1,...,p, where c; = (Ci,...,Cin,---;Cig)? iS a q x 1 vector of 
known constants. The unbiased estimators of the m,’s are 


q 
mi = X cnn = cf 6, 
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each of which is a normally distributed random variable with mean m; 
and variance cT Rc;. Then 


A 


de el) i=l,...,p 

sVeP Re; _* 

is a Student’s t-variate and Y),..., Yp have the usual p-variate t distribu- 
tion with degrees of freedom v, zero means, and the correlation matrix 
{diu} given by 


Y, = 


T 
c; Rey 
7 ; 
Vc} Reic Rey 


For multiple comparisons, one computes the one- and two-sided confi- 
dence interval estimates of m; (i = 1,...,p) simultaneously with a joint 
confidence coefficient 1 — a, say. These estimates are given by (Dunnett, 


1955) 
Pii + hysy/c} Re; 
Tj £ hzs4/ cT Rci, 


respectively, where the constants hı and hz are determined so that the 
intervals in each case have a joint coverage probability of 1 — œa. The 
constants hı and hko can be computed using the methods discussed in 
Chapter 8. 


Oiu = 


and 


12.5 Other Applications 


Bayesian prediction approaches using the multivariate t distribution 
have attracted wide-ranging applications in the last several decades, 
and many sources are available in periodic and monographic literature. 
Chien (2002) discusses applications in speech recognition and online en- 
vironmental learning. In experiments of hands-free car speech recogni- 
tion of connected Chinese digits, it was shown that the proposed ap- 
proach is significantly better than conventional approaches. Blattberg 
and Gonedes (1974) were one of the first to discuss applications to se- 
curity returns data. For other applications, we refer the reader to the 
numerous modern books on multivariate analysis and to the Proceedings 
of the Valencia International Meetings. 
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